[PDF] lrsarith: a small fixed/hybrid arithmetic C library

Abstract

We describe lrsarith which is a small fixed precision and hybrid arithmetic C library for integers and rationals that we developed for use in the lrslib library for polyhedral computation. Using a generic set of operations, a program can be compiled with either 64-bit or 128-bit (if available) fixed precision, with an extended precision library such as GMP or the built-in MP routines. A simple scheme checks for overflow and either terminates the program or, in hybrid mode, changes to a higher precision arithmetic. Implementing these arithmetics in lrslib resulted in only minimal changes to the original code. We give computational results using lrs and mplrs, vertex/facet enumeration codes in lrslib, using 64 and 128 bit fixed integer arithmetic with and without overflow checking, GMP arithmetic, lrsarith hybrid arithmetic with both GMP and MP, and FLINT hybrid arithmetic. We give a small self-contained example C program using the lrsarith package in both fixed precision and hybrid mode.

Full PDF

aa r X i v : . [ c s . M S ] J a n lrsarith : a small ﬁxed/hybrid arithmetic C library David AvisSchool of Informatics, Kyoto University, Kyoto, Japan andSchool of Computer Science, McGill University, Montr´eal, Qu´ebec, CanadaCharles JordanGraduate School of Information Science and Technology, Hokkaido University, Sapporo, JapanFebruary 1, 2021

Abstract

We describe lrsarith which is a small ﬁxed precision and hybrid arithmetic C library for integersand rationals that we developed for use in the lrslib library for polyhedral computation. Using ageneric set of operations, a program can be compiled with either 64-bit or 128-bit (if available)ﬁxed precision, with an extended precision library such as GMP or the built-in MP routines. Asimple scheme checks for overﬂow and either terminates the program or, in hybrid mode, changesto a higher precision arithmetic. Implementing these arithmetics in lrslib resulted in only minimalchanges to the original code. We give computational results using lrs and mplrs , vertex/facetenumeration codes in lrslib , using 64 and 128 bit ﬁxed integer arithmetic with and without overﬂowchecking, GMP arithmetic, lrsarith hybrid arithmetic with both GMP and MP, and FLINT hybridarithmetic. We give a small self-contained example C program using the lrsarith package in bothﬁxed precision and hybrid mode.

When writing mathematical software, a fundamental choice is the arithmetic to use. It is easy tobe seduced by the performance of native integers, but the possibility of incorrect output causedby overﬂow can be a major concern. To guarantee correctness, in many cases one must rely onmultiprecision arithmetic such as that provided by the GNU Multiple Precision (GMP) ArithmeticLibrary. This comes with a large performance penalty compared to native integers on inputs wherethey suﬃce, and it’s tempting to maintain multiple versions with diﬀerent arithmetic.Until version 7, this was the situation with lrslib , a library of programs for polyhedral computationincluding e.g. lrs for vertex/facet enumeration and redund for redundancy removal. From the outset lrslib used exact arithmetic and could be compiled with either ﬁxed precision C integer arithmetic(without overﬂow checking) or with an internal extended precision arithmetic library. For safety, thedefault was to use extended precision. Later versions could also be compiled with the GMP packageand more recently, with the hybrid arithmetic package available in FLINT. The ﬁxed precision versionsperform several times faster than the extended precision versions but can produce incorrect results ifoverﬂow occurs.For this reason we developed a simple scheme to detect the possibility of overﬂow occurring whileusing ﬁxed precision arithmetic. The code could then halt and allow the user to restart using higheror extended precision arithmetic. This led to the development of a hybrid scheme in lrslib to allowautomatic restart with the next higher precision arithmetic. This gives users the performance of nativearithmetic in most cases where it is safe, while keeping the correctness provided by the extendedprecision versions. The restart is from (very near) the point where a possible overﬂow was detected.Implementing this in lrslib resulted in only minimal changes to the original code and gives a signiﬁcantperformance improvement.The main purpose of this paper is to explain how overﬂow checking and hybrid arithmetic wasimplemented and to give numerical results for a case study involving vertex/facet enumeration. This1xplains the improved performance of lrslib lrslib library, so we created the much smaller, independent lrsarith package which contains only the arithmetic code. It can be downloaded from [1].Implementing arithmetic in lrsarith allows developers to postpone the choice of arithmetic untilcompile time. With some additional eﬀort to handle overﬂows and restarting from intermediate pointsof the computation, it also allows the developer to obtain the performance of native integers whenthey suﬃce and the correctness of an extended precision library when required.

When comparing various codes for vertex/facet enumeration [2], we noted that the hybrid arithmeticavailable in normaliz could give a large speedup for certain inputs. The default in normaliz is to beginthe computation in 64-bit integers and to restart using GMP extended precision if overﬂow occurs.However, the restart is from the beginning of the computation, which can hurt performance on certaininputs where the repeated work is signiﬁcant (see the comparison in [2]).The Fast Library for Number Theory (FLINT) [5] is a collection of modules that support variousarithmetic operations. In particular, we use the fmpz module for extended precision arithmetic thatbalances “very eﬃcient” performance on small integers with performance similar to GMP for largeintegers. We will compare lrsarith performance with FLINT in Section 4.4. Although FLINT is easierto develop with since switching arithmetic is transparent to the user, lrsarith can give higher perfor-mance on small integers. FLINT contains much additional functionality beyond extended precisionarithmetic, but usually requires the user to install it.Porta [4] is a collection of routines related to polytopes, including vertex/facet enumeration. Itsupports both ﬁxed precision and its own multiprecision arithmetic and was included in the compari-son [2] mentioned earlier.The standard general purpose library for extended precision arithmetic with large integers is GMP .It is highly optimized for many platforms and was the default in lrslib for some time. Historically, lrslib used its own multiprecision arithmetic (MP) and this is still supported. Comparisons are givenin Section 4.4. The paper contains two parts that can essentially be read independently. The ﬁrst part gives a detailedexplanation of how we implement the various arithmetics with a simple but complete example programthat can be used as a template for developing with lrsarith . In Section 2 we show how overﬂow checkingis implemented in our ﬁxed precision lrslong library. We give a simple example to show how a singleC code can be compiled with 64-bit, 128-bit, MP or GMP arithmetic. In the ﬁrst three cases theprogram terminates when the possibility of overﬂow is detected. In Section 3 we continue the exampleto show how a program can restart after overﬂow has been detected, using the next level of arithmetic.This form of hybrid arithmetic is used in lrslib . Computational results are presented in both sectionsfor a larger example involving Collatz sequences.The second part of the paper begins in Section 4 where we present our case study and givecomputational results comparing various arithmetic packages that can be used in lrslib . These includeﬁxed integer arithmetic with and without overﬂow checking, GMP arithmetic, two versions of lrsarith hybrid arithmetic and FLINT hybrid arithmetic.

Arithmetic is handled in lrsarith by deﬁning a generic integer type and a set of generic operations. Ageneric integer a , integer vector v and integer matrix A are deﬁned https://gmplib.org/ rs_mp a; lrs_mp_vector v; lrs_mp_matrix A; allocated lrs_alloc_mp(a); v=lrs_alloc_mp_vector(n); A=lrs_alloc_mp_matrix(m,n); and freed lrs_clear_mp(a); lrs_clear_mp_vector(n); lrs_clear_mp_matrix(m,n); where m and n are the dimensions. The types are assigned at compile time depending on the arithmeticused. For 64-bit and 128-bit integers they are assigned to ﬁxed length integers and for GMP arithmeticto the GMP integer type: typedef long long lrs_mp[1]; /* one long integer */typedef __int128 lrs_mp[1]; /* one 128-bit integer */typedef long long lrs_mp[MAX_DIGITS + 1]; /* one MP integer */typedef mpz_t lrs_mp; /* one GMP integer */ Operations using lrs mp integers are written generically. Typical examples are as follows where a , b , c , d , e are lrs mp integers, i is a long long and the equivalent C code is given in parentheses: itomp(i,a); (a=i)copy(b,a); (b=a)addint(a,b,c); (c=a+b)mulint(a,b,c); (c=a*b)divint(a,b,c); (c=a/b a=a%b)linint(a, ka, b, kb); (a=a*ka+b*kb for ka, kb long long integers)qpiv(a,b,c,d,e); (a=(a*b-c*d)/e where the division is exact ) A small C program, ﬁxed.c , using some of these operations is given in Appendix A. It reads in aninteger k and attempts to repeatedly square it six times. We will discuss the program later in thissection. Generic operations are either assigned at compile time by macros, for example: or by C code in the case of MP. In this way arithmetic operations are essentially the same as usingthe underlying arithmetic package.The problem is that overﬂow is not detected in 64-bit and 128-bit arithmetic. We solve this problemby a technique we call lazy overﬂow handling . Using very light extra computation we detect whenit is possible that an overﬂow may occur. Control is then given to an overﬂow handler that eitherhalts computation or restarts the program using arithmetic with higher precision. While restartingfrom the beginning of execution is simplest, later we will see options for restarting from intermediatecheckpoints – allowing programs to use faster arithmetic for initial portions of computations in somecases.To incorporate lazy overﬂow handling we deﬁne some constants that depend on the word size W which is either 64 or 128 in lrsarith : MAXDm = jp W − − k MAXDl = 2 W/ − − MAXDa = 2 W − − MAXDa is the constant used in testing for overﬂow when using addition or similar operations,

MAXDl is used for linint and

MAXDm is used for multiplying. Three macros test whether the operands a and b are out of bounds: We claim that if the integer arithmetic would overﬂow then lrs overﬂow is called. Since we are usingsigned integer arithmetic this is equivalent to proving that the result (and intermediate values) in eachcase is at most 2 W − −

1. Proceeding case by case: • addint ( a, b, c ): | a | , | b | ≤ MAXDa and so a + b ≤ ∗ (2 W − −

1) = 2 W − − • mulint ( a, b, c ): | a | , | b | ≤ MAXDm and so a ∗ b ≤ (cid:16) √ W − − (cid:17) = 2 W − − • linint ( a, ka, b, kb ) | a | , | b | , | ka | , | kb | ≤ MAXDl and so a ∗ ka + b ∗ kb ≤ W/ − − = 2 W − − W/ + 2 .Note that divint cannot overﬂow and that the analysis for qpiv is thus essentially the same as for linint , proving the claim.Rational arithmetic is handled in lrsarith in a very simple way by representing the rational number a/b by two lrs mp integers a and b . The operation reduce(a,b) divides a and b by their greatest common divisor. Arithmetic operations for rationals are based onthe integer operations above. For example mulrat(a,b,c,d,e,f) (e/f = a/b * c/d, with e/f reduced) is implemented by mulint (a,c,e);mulint (b,d,f);reduce (e,f); Overﬂow checking for rational arithmetic is inherited from the corresponding integer arithmetic.We now look more closely at the code ﬁxed.c for iterated squaring given in Listing 1 of Appendix A.Among the includes on lines 1–3 we have the lrsarith header ﬁle. On lines 7–9 is the function lrs overﬂow that handles overﬂow processing. In this case it simply prints a message and halts the program. Inthe main routine we see that two lrs mp variables are declared. These are allocated and cleared inlines 16 and 26. These are null operations for 64 or 128-bit arithmetic and MP but are needed inGMP. On line 15 we initialize the arithmetic package. The next lines read an integer k and attemptto iteratively square it six times.Using the makeﬁle in Listing 2 we can compile ﬁxed.c with each of the arithmetic packages de-pending on the compiler switches used. For 64 or 128-bit arithmetic the -DLRSLONG switch is set, with -DB128 also set for 128-bit arithmetic. The -DSAFE switch enables overﬂow checking and handlingas described in this section. The ﬁles lrsarith.h and lrsarith.c use these switches to ensure thatthe correct arithmetic ﬁles are included. Lines 2 and 3 produce the binaries ﬁxed1 and ﬁxed2 . Forillustrative purposes we also compile without the -DSAFE switch obtaining ﬁxed1n and ﬁxed2n in lines4 and 5. In this case no overﬂow checking is performed. With the -DMP switch set the MP version isproduced. Finally by setting the -DGMP switch we compile with the external GMP library which mustbe preinstalled. For simplicity, we assume that the necessary ﬁles are in standard locations otherwisethe locations need to be speciﬁed. We ran ﬁxed1 , ﬁxed2 , ﬁxedmp and ﬁxedgmp with the input k = 5getting the following output, respectively: 4

25 625 390625 152587890625 overflow detected:halting5 25 625 390625 152587890625 2328364365386962890625 overflow detected:halting5 25 625 390625 152587890625 23283064365386962890625 5421010862427522170037264004349708557128906255 25 625 390625 152587890625 23283064365386962890625 542101086242752217003726400434970855712890625 ﬁxed1 can compute 5 before overﬂowing and ﬁxed2 can also compute 5 . Both ﬁxedmp and ﬁxedgmp can compute 5 . Running ﬁxed1n and ﬁxed2n with no overﬂow protection produces incorrectoutput. This can observed by noting the three cases where the last digit is not 5: We brieﬂy compare performance of the various ﬁxed arithmetic packages in order to evaluate theoverhead used by overﬂow checking. While this can also be seen in the more extended comparisondone in Section 4, here we use a simple code to further demonstrate the use of lrsarith .We consider a problem that uses a great deal of relatively simple computations and little output.The Collatz conjecture [6, 7] is a famous open problem in number theory: given a number k , wereplace k by 3 k + 1 if k is odd and by k/ for all k < .Our goal is to compare arithmetic packages and not to computationally verify larger values, so weavoid techniques such as huge lookup tables for simplicity. We consider the Collatz sequence from k as a path from the integer k to the value one. The set of such paths deﬁnes an inﬁnite tree with root k = 1 and the conjecture holds if all integers appear in the tree. One way to make the tree ﬁnite is togive a parameter max and consider the tree of all paths to the root that do not contain an integer k greater than max . This ﬁnite tree can be generated from the root without using memory to store itsnodes using the reverse search procedure.We wrote a code coll to generate the tree in this way. As long as max < / when the various arithmeticpackages are compiled with coll . The binary collﬂint uses FLINT arithmetic (version 2.6.3) and theother binaries are labelled in the same way as for ﬁxed.c above. max nodes coll1 coll1n coll2 coll2n collgmp collﬂint collmp Table 1: Collatz tree generation: times in seconds Collatz tree ( mai48 )The values in Table 1 are small enough to avoid overﬂows, but one can see that overﬂow checkingin lrslong results in only minor overhead. The relative performance of the packages will vary betweenmachines, mix of arithmetic operations, compilers and versions of extended precision libraries; however,these results give a sample of what can be obtained. It is also worth noting that the multiprecisionarithmetics are generally focused more on larger operands. In any case, for this computation on thismachine, 128-bit arithmetic is roughly eight times slower than 64-bit arithmetic. FLINT arithmeticis somewhat faster than GMP (version 6.0) but slower than the dedicated ﬁxed precision arithmetics.This is generally reasonable – FLINT makes more eﬀort to obtain good performance on small operands,however lrsarith ﬁxed arithmetic is essentially the same as using native arithmetic. See also David Barina’s page on the current status: https://pcbarina.fit.vutbr.cz/ coll is contained in lrsarith -010. mai48 : 2x AMD EPYC7552 2.3GHz, 48 cores, 512 GB memory, CentOS 7.6, gcc 4.8.5 Hybrid arithmetic

The ﬁxed arithmetic versions described in the previous section are very simple to implement and easyto use, giving good performance for inputs that do not require very long integers. If overﬂow occurs theuser can manually rerun the job with a higher precision arithmetic package. However, it is certainlymore convenient to automatically switch from one arithmetic to another with higher precision whenoverﬂow could occur. In this section we describe how this can be done using lrsarith , while makingonly minor modiﬁcations to the original user code.

We illustrate our approach using the example program of the previous section that reads an integer k and successively squares it. The hybrid code is shown in Appendix B. Lines 9–29 contain the function run that is essentially the function main from Appendix A. Instead of reading the integer k , it is passedto run as a parameter.The major change is the use of setjmp to allow lrs overﬂow to return control to the program thatcalls run . This is achieved via the variable buf1 deﬁned on line 7. The main loop (lines 18–25) of run is enclosed by a test on line 17. If lrs overﬂow (lines 31–33) is called by the arithmetic package a longjump is performed to the statement immediately after the end of the enclosed loop, i.e. line 26. Thisprints a warning and returns to the routine that called run .In fact there are three routines that call run , namely run1 , run2 , and runmp (GMP or MP), but onlythe ﬁrst two can trigger a call to lrs overﬂow . The function run itself will need to be compiled threetimes, once for each package. The arithmetic package used and the routine to call run are selectedat compile time by compiler directives in the makeﬁle and on lines 35–43, respectively. The problemnow arises that since run will be compiled three times the three versions will need diﬀerent names,a process known as name mangling . In fact all of the routines that use lrs mp will also need namemangling. The particular scheme used in lrsarith was suggested by David Bremner and is illustratedin line 5. A unique suﬃx suf deﬁned in the arithmetic header ﬁle used is added to the function run .These deﬁne lines will be needed for each user supplied routine. The deﬁne for lrs overﬂow is containedin lrslong .h.Listing 5 contains the main program with its header ﬁle given in Listing 4. It begins by readingthe parameter k and by calling run1 which uses 64-bit arithmetic. If overﬂow occurs a return codeof 1 is triggered by lrs overﬂow causing main to call run2 using 128-bit arithmetic. If this in turnoverﬂows a ﬁnal call is made to runmp which uses GMP . In the makeﬁle given in Listing 6, lines4–6 compile the arithmetic libraries. Lines 7–9 compile hybridlib.c three times, once with eacharithmetic library. Finally line 11 combines everything into a single executable hybrid . Running hybrid with k = 5 produces: We continue the example of generating Collatz trees introduced in Section 2.2. This time we use muchlarger values of max in order to create overﬂow conditions in both 64-bit and 128-bit arithmetic. Asthe trees now become extremely large, we terminate the programs after 10 nodes have been generated.The program is deterministic so in each case the same nodes are traversed. The program coll is thehybrid code combining coll1 , coll2 and collgmp created in the same way as hybrid described above.The results are shown in Table 2. On each line the ﬁnal arithmetic used in coll is shown in blue. Itis interesting to observe that for each of these three runs collgmp runs in roughly the same time andthat, for max = 10 , this is faster than the 128-bit arithmetic coll2 (and hence coll ).The simple scheme described in this section will be adequate in many applications but has anumber of disadvantages. First, memory allocated during the run for a given arithmetic may not be For simplicity we have omitted lines which produce a hybrid executable with MP, but they are in the lrsarith makeﬁle . ax coll coll1 coll2 collgmp collmp collﬂint (hybrid) (hybrid)10 (o)=possible overﬂow detectedTable 2: Collatz tree generation: 10 nodes, times in seconds ( mai48 )freed after an overﬂow occurs, causing a memory leak. More importantly, the simple scheme hererestarts from the start of program execution. Forgetting the past can be helpful but this can hurtperformance if overﬂow occurs near the end of program execution. It would be much better to notredo work that has already been done, especially with slower arithmetic.In applications such as lrslib , these are serious issues. The deﬁnition of lrs mp depends on the arith-metic, so oﬀsets into structures containing lrs mp can change after switching arithmetic. Handlingoverﬂows appropriately is therefore more complicated than simply forgetting the past and restarting,as was described here.The solution used in lrslib was to use a global variable to point to the allocated data structures. Thisallows lrs overﬂow to free the data after overﬂow is detected, before switching to the next arithmetic.A further improvement in lrslib was to allow the program to restart at the point in the calculationwhere the overﬂow was detected. Again this was done using a global variable and also the ability ofthe original code to restart. Fortunately this was already available. In other programs, it may benecessary to add periodic checkpoints and it can be helpful to separate structures that contain lrs mp from those that don’t. When comparing [2] various codes for vertex/facet enumeration, we noted that hybrid arithmetic couldgive a large speedup for certain inputs, but that this was not available in lrslib at the time. We nowturn our attention to the original motivation for developing the hybrid version of lrsarith : obtainingthese speedups in lrslib v. 7.We begin by introducing the basics of vertex and facet enumeration. Then, we explain the parallelhybrid version mplrs , since this handles overﬂow somewhat diﬀerently than hybrid lrs . In Section 4.3we describe the experimental setup and in Section 4.4 we present a comparison between the diﬀerentarithmetic packages as used in lrs and mplrs . A convex polyhedron P can be represented by either a list of vertices and extreme rays, called aV-representation, or a list of its facet deﬁning inequalities, called an H-representation. The vertexenumeration problem is to convert an H-representation to a V-representation. The computationallyequivalent facet enumeration problem performs the reverse transformation. For further backgroundsee Ziegler [8]. We will use the lrs program in lrslib to experiment with various instances of theseproblems. A comparison with other codes including mplrs , a parallel version of lrs , to solve theseproblems can be found in [2].The input is represented by an m by n matrix. For a vertex enumeration problem this is a listof m inequalities in n − P . A vertex (resp. extreme ray) is theintersection of a set of n − n −

2) input inequalities, taken as equations, that satisﬁes theremaining inequalities. A major diﬃculty is caused by degeneracy which occurs when more than n − lrs generates multiple representations of the vertex,known as bases. For a facet enumeration problem the input is a list of the vertices of P each beginningwith a 1 in column one . A facet is deﬁned by a set of n − Extreme rays would be indicated by a zero in column one. n − lrs will generate the facet multiple times, each known asa basis. When degeneracy occurs the vertex or facet is output when the index-wise lexicographicallyminimum basis is found. mplrs On detecting a possible overﬂow, the hybrid version of lrs switches to the next highest precision arith-metic package which it uses from that point onwards. The sequential implementation was parallelizedas mplrs [2] which dynamically partitions the work and provides a series of subproblems to multiple lrs workers.If each of these workers was a hybrid lrs process, it would start each problem with the fastestarithmetic and switch to higher precision when detecting a possible overﬂow. This has a few drawbacks;1. Duplicate output: it’s possible that when restarting after an overﬂow, we reprint the most recentline of output.2. Performance: if overﬂow occurs, it usually occurs often in the run (ie not infrequently). Thereis overhead in restarting and switching arithmetic and so we would prefer to avoid frequentswitches.In the hybrid version of mplrs , each of the workers is a ﬁxed arithmetic process that returns to mplrs when possible overﬂow is detected. At that point, mplrs re-initializes the worker in questionusing the next available arithmetic. All output produced by the subproblem is discarded and theworker restarts from the beginning (of the subproblem). The worker uses the new arithmetic fromthat point onwards.This approach avoids duplicate output: by default, hybrid mplrs holds output produced by theworker if the arithmetic could overﬂow, ﬂushing it only when the job ends. It also resolves the secondproblem; each worker can only overﬂow and switch arithmetic twice in the overall run – the same ashybrid lrs . In addition, if overﬂow is a rare event then it is possible for only some workers to overﬂow.This makes larger speedups possible comparable to hybrid lrs .There is overhead in two areas. First, when overﬂow is detected the worker restarts its job. Thisoverhead is limited because jobs are usually very small and workers can only overﬂow at most twiceduring the run. Next, since mplrs traverses diﬀerent parts of the reverse search tree in a diﬀerentorder compared to lrs , it’s possible for overﬂow to occur earlier than it would in lrs . This could hurtperformance if lrs overﬂows only at the end but could also help performance by avoiding early lrs overﬂows.One complication is that for volume computation, mplrs internally uses the maximum precisionarithmetic available. This means that mplrs may not agree with the worker process on arithmetic.Communication between the worker process and mplrs therefore avoids using lrs mp integers.After checkpointing and restarting, hybrid mplrs begins again with the fastest arithmetic available.Checkpoint ﬁles are compatible between the various arithmetic packages. The polytopes we tested are described in Table 3 and, except for two new problems, were previouslydescribed and used in [2]. The problems range from non-degenerate to highly degenerate polyhedra.This table includes the results of an lrs run on each polytope as lrs gives the number of bases in asymbolic perturbation of the polytope. The new problems are: • cp7 is the cut polytope for 7 points which has input coeﬃcients 0 or 1 and is highly degenerate. • p8-6 is related to the holographic cone studied in physics and is an extension of the eight pointcut polytope. Input coordinates are 0, 1 or -1 and it is the most degenerate problem studied. See e.g., Figure 3(c,d) in [2].

8e include a column labelled degeneracy which is the number of bases divided by the numberof vertices (or facets) output, rounded to the nearest integer. We have sorted the table in order ofincreasing degeneracy. The horizontal line separates the non-degenerate from the degenerate problems.The corresponding input ﬁles are available by following the download link at [1]. Note that the inputsizes are small, roughly comparable and much smaller than the output sizes.

Name Input Output lrs

H/V m n size V/H size bases secs depth degeneracy c30

V 30 16 4.7K 341088 73.8M 319770 36 14 1 c40

V 40 21 12K 40060020 15.6G 20030010 7531 19 1 km22

H 44 23 4.8K 4194304 1.2G 4194304 234 22 1 perm10

H 1023 11 29K 3628800 127M 3628800 283 45 1 vf500

V 500 7 98K 56669 38M 202985 137 41 4 vf900

V 900 7 20K 55903 3.9M 264385 23 45 5 mit71

H 71 61 9.5K 3149579 1.1G 57613364 15474 20 18 fq48

H 48 19 2.1K 119184 8.7M 7843390 44 24 66 mit

H 729 9 21K 4862 196K 1375608 132 101 283 cp7

V 64 22 3K 116764 7.4M 308644212 4071 41 2643 bv7

H 69 57 8.1K 5040 867K 84707280 1256 17 16807 p8-6

H 154 92 36K 4452 1.2M 110640628 16152 63 24852

Table 3: Polytopes tested and lrs v. 7.1 times ( mai48 ) In this section we give numerical results on using lrslib to solve the vertex enumeration problemsdescribed in the previous section with the various arithmetic packages in lrsarith . We test the followingsuite of codes, which apart from the exception noted below, are available in lrslib v. 7.1: • lrs1 : 64-bit ﬁxed arithmetic with overﬂow checking. • lrs2 : 128-bit ﬁxed arithmetic with overﬂow checking. • lrsgmp : GMP arithmetic (version 6.0). Comparable to default lrs v. 6.2 and earlier. • lrs : lrsarith hybrid arithmetic. Starts with lrs1 and switches to lrs2 (if available) and ﬁnally lrsgmp . Default version of lrs. • lrsMP : As lrs except in the ﬁnal step internal MP arithmetic is used. • lrsﬂint : FLINT hybrid arithmetic (version 2.6.3). • mplrsgmp : Multi-core version of lrsgmp , comparable to default mplrs v. 6.2 and earlier. • mplrs : Hybrid multi-core version of lrs . Default version of mplrs .Also available in lrslib v. 7.1 are the parallel versions mplrs1 , mplrs2 , and mplrsﬂint of the corresponding lrs codes above.The results of the tests are shown in Table 4. The programs lrs1 and lrs2 either produce the correctoutput or indicate that an overﬂow may occur (o) and terminate. The columns lrs and lrsMP give therunning times for the hybrid versions. When 64 or 128-bit arithmetic suﬃces these two running timesare essentially the same. When extended precision is required the internal MP arithmetic may bemuch slower than GMP, especially for c40 , which requires 459 decimal digits. The ﬁnal two columnsshow the speedups obtained by using the multicore versions mplrs and mplrsgmp with 40 processorsavailable.In the table we show in blue which arithmetic version was in use when lrs , shown in red, terminated.As to be expected lrs performs roughly the same as lrsgmp when the integers become too large for Available in v. 7.2 or on request. ame Single Arithmetic Hybrid Arithmetic 40 cores lrs1 lrs2 lrsgmp lrs lrsMP lrsﬂint mplrsgmp mplrs c30 (o) (o) 36 36 295 42 2 3 c40 (o) (o) 7707 7581 97532 8341 402 389 km22 (o) (o) 237 234 384 187 8 8 perm10

288 538 2429 283 284 1120 113 16 vf500 (o) (o) 139 137 1658 163 12 11 vf900 (o) 23 103 23 23 170 8 2 mit71 (o) (o) 15489 15474 110467 26432 731 723 fq48

44 77 265 44 44 147 11 1 mit (o) 134 563 132 131 270 27 7 cp7 bv7 p8-6 (o)=possible overﬂow detectedTable 4: Comparison of running times for various types of arithmetic, lrslib v. 7.1 ( mai48 )ﬁxed precision. Speedups of 2–4 times are observed for the combinatorial problems which can besolved using only 64 or 128 bits. In general lrsﬂint did not perform as well as lrs but it is usually fasterthan lrsgmp on the combinatorial problems. The approach used in lrsarith hybrid arithmetic allowsit to obtain essentially native integer performance, but requires eﬀort from the developer to handleoverﬂow. FLINT is easier for the developer but did not achieve the same performance when all valuesare small. Another possible reason for the lack of performance is that we did not use FLINT matricesas this would have required substantial reprogramming of lrs .We can observe that 128-bit arithmetic runs roughly 1.5–2 times slower than 64-bit arithmetic,when 64-bit arithmetic suﬃces. Hence there is a strong incentive to start the computation with 64bits using 128 bits only when necessary. This latter outcome occurred for mit and vf900 . We introduced lrsarith which is a small C library for performing ﬁxed precision arithmetic on integersand rationals with overﬂow protection and allows hybrid arithmetic. It was developed as part of the lrslib polyhedral computation package. However, due to its small size and ease of use we decided torelease it as an independent package. The details of the method used and small examples were givenin the ﬁrst part of the paper.In the second part we gave computational results for some vertex enumeration problems. Theexamples show the diversity of such problems and this is reﬂected in the arithmetic precision requiredto solve them. Many combinatorial polytopes involve calculations that can be completed withoutoverﬂow using 64 (or 128) bit integers. Using ﬁxed arithmetic with overﬂow checking or hybridarithmetic, speedups of 2–4 times can be achieved and these carry over to parallel implementations.The new hybrid version described here explains the performance improvements found in lrslib version7 relative to previous versions or earlier comparisons [2].

Acknowledgements

We thank David Bremner for many helpful discussions and in particular for the elegant implementationof name mangling. This work was partially supported by JSPS Kakenhi Grants 16H02785, 18H05291,18K18027, 20H00579, 20H00595 .

References [1] D. Avis. http://cgm.cs.mcgill.ca/~avis/C/lrs.html .102] David Avis and Charles Jordan. mplrs: A scalable parallel vertex/facet enumeration code.

Math-ematical Programming Computation , 10(2):267–302, 2018.[3] David Barina. Convergence veriﬁcation of the Collatz problem.

The Journal of Supercomputing ,2020.[4] T. Christof and A. Loebel. http://porta.zib.de .[5] William B. Hart. Fast library for number theory: an introduction. In

Mathematical Software –ICMS 2010 , volume 6327 of

Lecture Notes in Computer Science , pages 88–91, 2010.[6] Jeﬀrey C. Lagarias. The 3 x + 1 problem: An annotated bibliography (1963–1999) (sorted byauthor). arXiv:math/0309224, 2003.[7] Jeﬀrey C. Lagarias. The 3 x + 1 problem: An annotated bibliography, II (2000-2009).arXiv:math/0608208, 2006.[8] G¨unter M. Ziegler. Lectures on polytopes . Springer, 1995.

A Single arithmetic code for repeated squaring

Listing 1: ﬁxed.c < s t d i o . h > < s t d l i b . h > ” l r s a r i t h . h” FILE ∗ l r s i f p , ∗ l r s o f p ; void l r s o v e r f l o w ( i nt parm ) { f p r i n t f ( l r s o f p , ” o v e r f l o w d e t e c t e d : h a l t i n g \ n” ) ; e x i t ( 1 ) ; } i nt main ( void ) { long i , k ; l r s m p a , b ; l r s m p i n i t ( 1 0 , s t d i n , s t d o u t ) ; l r s a l l o c m p ( a ) ; l r s a l l o c m p ( b ) ; f s c a n f ( l r s i f p , ”%l d ” ,&k ) ; itomp ( k , b ) ; pmp( ”” , b ) ; for ( i =1; i < =6; i ++) { copy ( a , b ) ; mulint ( a , a , b ) ; pmp( ” ” , b ) ; } l r s c l e a r m p ( a ) ; l r s c l e a r m p ( b ) ; f p r i n t f ( l r s o f p , ” \ n” ) ; return } Listing 2: makeﬁle f i x e d : f i x e d . c l r s a r i t h . c l r s a r i t h . h l r s l o n g . c l r s l o n g . h lr sg mp . c lr sg mp . h lr smp. c lr smp . h $ (CC) − DLRSLONG − DSAFE − o f i x e d 1 l r s a r i t h . c f i x e d . c $ (CC) − DLRSLONG − DB128 − DSAFE − o f i x e d 2 l r s a r i t h . c f i x e d . c $ (CC) − DLRSLONG − o f i x e d 1 n l r s a r i t h . c f i x e d . c $ (CC) − DLRSLONG − DB128 − o f i x e d 2 n l r s a r i t h . c f i x e d . c $ (CC) − DMP − o fixedmp l r s a r i t h . c f i x e d . c $ (CC) − DGMP − o fixedgmp l r s a r i t h . c f i x e d . c − lgmp Hybrid arithmetic code for repeated squaring

Listing 3: hybridlib.c < s t d i o . h > < s t d l i b . h > < setjmp . h > ” l r s a r i t h . h” run s u f ( run ) s t a t i c jmp buf buf1 ; / ∗ r e t u r n l o c a t i o n when o v e r f l o w i n g ∗ / i nt run ( long k ) { long i ; l r s m p a , b ; l r s m p i n i t ( 0 , s t d i n , s t d o u t ) ; l r s a l l o c m p ( a ) ; l r s a l l o c m p ( b ) ; itomp ( k , b ) ; pmp( ”” , b ) ; i f ( ! setjmp ( buf1 ) ) { / ∗ o v e r f l o w t e s t ∗ / for ( i =1; i < =6; i ++) { copy ( a , b ) ; mulint ( a , a , b ) ; pmp( ” ” , b ) ; } l r s c l e a r m p ( a ) ; l r s c l e a r m p ( b ) ; return } p r i n t f ( ” o v e r f l o w d e t e c t e d : r e s t a r t i n g \ n” ) ; l r s c l e a r m p ( a ) ; l r s c l e a r m p ( b ) ; return } void l r s o v e r f l o w ( i nt parm ) { longjmp ( buf1 , 1 ) ; } d e f i n e d (MA) && d e f i n e d (LRSLONG) B128 i nt run2 ( long k ) { / ∗

128 b i t ∗ / i nt run1 ( long k ) { / ∗

64 b i t ∗ / i nt runmp ( long k ) { / ∗ o t h e r a r i t h m e t i c ∗ / return ( run ( k ) ) ; } < s t d i o . h > < s t d l i b . h > FILE ∗ l r s i f p ; / ∗ i n p u t f i l e p o i n t e r ∗ / FILE ∗ l r s o f p ; / ∗ o u t p u t f i l e p o i n t e r ∗ / i nt run1 ( long k ) ; i nt run2 ( long k ) ; i nt runmp ( long k ) ; Listing 5: hybrid.c i nt main ( void ) { long k ; s c a n f ( ”%l d ” ,&k ) ; i f ( run1 ( k ) ) / ∗ TRUE on 64 b i t o v e r f l o w ∗ / i f ( run2 ( k ) ) / ∗ TRUE on 128 b i t o v e r f l o w ∗ / runmp ( k ) ; p r i n t f ( ” \ n” ) ; return } Listing 6: makeﬁle HOBJ=l r s l o n g 1 . o h y b r i d l i b 1 . o l r s l o n g 2 . o lr sg mp . o h y b r i d l i b 2 . o h y b r i d l i b g m p . o h y b r i d : h y b r i d . c l r s l o n g . c l r s l o n g . h lr sg mp . c lr sg mp . h h y b r i d l i b . c h y b r i d . hl r s a r i t h . c $ (CC) − DMA − DSAFE − DLRSLONG − c − o l r s l o n g 1 . o l r s a r i t h . c $ (CC) − DMA − DB128 − DSAFE − DLRSLONG − c − o l r s l o n g 2 . o l r s a r i t h . c $ (CC) − DMA − DGMP − c − o lr sg mp . o l r s a r i t h . c $ (CC) − DMA − DSAFE − DLRSLONG − c − o h y b r i d l i b 1 . o h y b r i d l i b . c $ (CC) − DMA − DB128 − DSAFE − DLRSLONG − c − o h y b r i d l i b 2 . o h y b r i d l i b . c $ (CC) − DMA − DGMP − c − o h y b r i d l i b g m p . o h y b r i d l i b . c $ (CC) − DMA − DLRSLONG − DSAFE − o h y b r i d $ { HOBJ } h y b r i d . c − lgmplgmp