[PDF] Brightening the Optical Flow through Posit Arithmetic

Abstract

As new technologies are invented, their commercial viability needs to be carefully examined along with their technical merits and demerits. The posit data format, proposed as a drop-in replacement for IEEE 754 float format, is one such invention that requires extensive theoretical and experimental study to identify products that can benefit from the advantages of posits for specific market segments. In this paper, we present an extensive empirical study of posit-based arithmetic vis-\`a-vis IEEE 754 compliant arithmetic for the optical flow estimation method called Lucas-Kanade (LuKa). First, we use SoftPosit and SoftFloat format emulators to perform an empirical error analysis of the LuKa method. Our study shows that the average error in LuKa with SoftPosit is an order of magnitude lower than LuKa with SoftFloat. We then present the integration of the hardware implementation of a posit adder and multiplier in a RISC-V open-source platform. We make several recommendations, along with the analysis of LuKa in the RISC-V context, for future generation platforms incorporating posit arithmetic units.

Full PDF

11 Preprint - Accepted in ISQED 2021

Brightening the Optical Flow through PositArithmetic

Vinay Saxena ∗ , Ankitha Reddy ∗ , Jonathan Neudorfer ∗ , John Gustafson ‡ ,Sangeeth Nambiar ∗ , Rainer Leupers † , Farhad Merchant †∗ Bosch Research and Technology Centre - India, Bangalore † Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, Germany ‡ National University of Singapore, Singapore { vinay.saxena, ankitha.reddy, jonathan.neudorfer, sangeeth.nambiar } @in.bosch.com, [email protected] { farhad.merchant, leupers } @ice.rwth-aachen.de Abstract —As new technologies are invented, their commercialviability needs to be carefully examined along with their technicalmerits and demerits. The posit TM data format, proposed as adrop-in replacement for IEEE 754 TM ﬂoat format, is one suchinvention that requires extensive theoretical and experimentalstudy to identify products that can beneﬁt from the advantagesof posits for speciﬁc market segments. In this paper, we presentan extensive empirical study of posit-based arithmetic vis-`a-visIEEE 754 compliant arithmetic for the optical ﬂow estimationmethod called Lucas-Kanade (LuKa). First, we use SoftPosit and

SoftFloat format emulators to perform an empirical error analy-sis of the LuKa method. Our study shows that the average errorin LuKa with SoftPosit is an order of magnitude lower than LuKawith SoftFloat. We then present the integration of the hardwareimplementation of a posit adder and multiplier in a RISC-Vopen-source platform. We make several recommendations, alongwith the analysis of LuKa in the RISC-V context, for futuregeneration platforms incorporating posit arithmetic units.

Index Terms —Optical ﬂow, computer arithmetic, posits,ﬂoating-point, Lucas-Kanade algorithm

I. I

NTRODUCTION

The posit data type is proposed as a drop-in replacementfor IEEE 754 compliant ﬂoating-point format [1]. Posit formatoffers compelling advantages over IEEE 754 compliant ﬂoatformat, such as higher accuracy and wider dynamic range.For arithmetic operations, posits require simpler hardwarecompared to a fully-compliant implementation of IEEE 754ﬂoats [2][3]. It has been shown experimentally that an n -bitﬂoating-point adder/multiplier can be replaced by an m -bitposit adder/multiplier where m < n , without compromisingaccuracy and range [4][5]. This is due to greater information-per-bit in the posit data type compared to its IEEE-compliantcounterpart. Several researchers around the world are workingon the efﬁcient realization of posit arithmetic units; studiesof posit arithmetic for different application domains havebeen published [6][7]. The SoftPosit emulation library supportsﬂoat-like arithmetic operations with different posit conﬁgura-tions and is closely patterned after the

SoftFloat library fromBerkeley. We believe the time has arrived to apply SoftPositand SoftFloat to analyze the merits of posits versus ﬂoats forwidely-used commercial applications.Since the inception of posit data representation, there havebeen several implementations in the literature of posit arith-metic operations. The early and open-source hardware imple- mentations of a posit adder and multiplier were presented in[2] and [4]. In [4], the authors covered the design of a paramet-ric adder/subtractor while in [2], the authors presented para-metric designs of ﬂoat-to-posit and posit-to-ﬂoat converters,and a multiplier along with the design of an adder/subtractor.A major disadvantage of the designs presented in [2] and [4]is that the designs are yet to be fully veriﬁed and containmultiple errors. The PACoGen open-source framework that cangenerate a pipelined adder/subtractor, multiplier, and divider ispresented in [7]. The design presented in [7] has a disadvan-tage that of not synthesizing for the exponent size zero, andhence cannot be considered a fully parametric implementation.A more complete implementation of a parametric posit adderand multiplier generator is presented in [5].

Optical ﬂow is caused by the relative motion of an observerand a scene that has objects in motion. Out of several methodsin the literature, we choose the Lucas-Kanade (LuKa) methodfor our experiments due to its simplicity and computationalintensity [8].Recently, the open-source instruction set architecture (ISA)called RISC-V has gained a following in industry andacademia. We integrated a posit adder and multiplier withthe RI5CY core [9] to create a posit-enabled RISC-V imple-mentation. We compare area and energy numbers for ﬁeld-programmable gate array (FPGA) synthesis of a RI5CY corewith IEEE 754 compliant and with posit arithmetic. The majorcontributions in this paper are as follows: • A detailed empirical study of LuKa using SoftPosit andSoftFloat where we compare numerical accuracy in LuKafor posits and IEEE 754 compliant ﬂoats • RISC-V-based comparison of area and delay using positsversus fully-compliant IEEE 754 ﬂoats (to the best of ourknowledge, this is the ﬁrst such study) • Performance analysis of LuKa on RISC-V with positand IEEE 754 compliant ﬂoats, and discussion of currentresearch issues in posit arithmeticThe rest of the paper is organized as follows: In Section II, wepresent an overview of IEEE 754-2019 format, posit numberformat, and the LuKa method along with the relevant literature.In Section III, accuracy analyses of LuKa using SoftFloat andSoftPosit are discussed in detail. A hardware implementationis presented in Section IV along with performance measure-ments. We summarize our conclusions in Section V. ©2021 IEEE a r X i v : . [ c s . A R ] J a n Fig. 1: Generic comparison of IEEE 754 ﬂoating-point ( ﬂoat )and posit number formats for non-exception valuesII. B

ACKGROUND

A. IEEE 754 Compliant and Posit Number Systems

The IEEE 754-2019 binary ﬂoating-point format numbershave three parts for normal ﬂoats: a sign, an exponent, anda fraction (see Fig. 1). The sign is the most signiﬁcant bitand indicates whether the number is positive or negative. Insingle precision, the next bits represent the exponent of thebinary number ranging from − to . The remaining bits represent the fractional part. The format is: val = ( − sign × exp − bias × (1 . fraction ) (1)When the exponent bits express the minimum (all bits)or maximum (all bits), an exception value is indicated. It iscurrently common for vendors to claim IEEE 754 compliancein their hardware while actually complying only for the case ofnormal ﬂoats. Full IEEE 754 compliance for exception cases,deemed to be rare, is seldom supported in hardware; instead,traps to software or microcode are used. This approach de-grades both performance and security; data-dependent timingcreates a side-channel security hole. Posit arithmetic was proposed as a drop-in replacement forIEEE 754 arithmetic in 2017 [1]. Posit arithmetic has severaladvantages over IEEE 754 arithmetic: higher accuracy for themost commonly-used values, simpler hardware implementa-tion, smaller chip area, and lower energy cost [5] [10]. UnlikeIEEE 754 ﬂoats, there are no subnormal posit numbers, nor isthere any need for them; | x − y | produces a zero result if andonly if x = y . There are only two exception cases: zero andnot-a-real (NaR). For all other cases, the value val of a positis given by val =( − sign × useed k × exp × (1 + fn − (cid:88) i =1 b fn − − i − i ) (2)The regime indicates a scale factor of useed k where useed =2 es and es is the exponent size. The numerical value of k isdetermined by the run length of or bits in the string ofregime bits. Run-length encoding of the regime automaticallyallows more fraction bits for the more common values forwhich magnitudes are closer to 1, and thus provides taperedaccuracy in a bit-efﬁcient way that preserves ordering. Furtherdetails about the posit number format and posit arithmetic canbe found in [1]. The posit format is depicted in Fig. 1. B. Lucas-Kanade Method

Despite its limitations in determining optical ﬂow informa-tion in uniform images, the LuKa technique and its variantsare the widely used methods for estimation of the optical ﬂowin commercial products [11]. Suppose I is the brightness ofthe pixel at position ( x ( t ) , y ( t )) at time t . We wish to solve I x u x + I y u y + I t = 0 (3) where I x , I y , and I t represent the x , y , and t directionalgradients, respectively, and u x and u y represent the opticalﬂow to calculate. To solve this equation, a local smoothness constraint is added, which assumes the change in u x and u y in a small neighborhood of pixels to be extremely small. Theﬁnal vector (cid:126)u containing the ﬂow components is obtained fromthe equation (cid:126)u = ( A T A ) − A T B (4)where A is the directional derivative matrix of the image and B is the time derivative vector. The derivatives here are simpledeltas from one image to the next with a resolution of 1/255.Since, the matrix A T A is a × matrix, we use Cramer’srule for the matrix inversion. C. Related Work

There have been several attempts of posit implementationsince the ﬁrst proposal. The early parameterized designs werepresented in [2], [4], [7], and [5]. The designs presentedin [2], [4], and [7] are open-source but do not synthesize forexponent size zero while the design presented in [5] is notopen-source. A power-efﬁcient posit multiplier is presentedin [12]. The authors in [12] present a scheme where they dividethe fraction part of the multiplier into several chunks and usethem efﬁciently resulting in 16% power efﬁciency over thebase-line implementation.Several posit implementations are explicitly focused onmachine learning applications. Performance-efﬁciency trade-off for deep neural network (DNN) inference is presentedin [13]. The authors have discussed overall neural networkefﬁciency and performance trade-offs in [13]. A template-based posit multiplier is presented in [14] where authors haveincorporated training and inference of the neural networks.Authors have shown that 8-bit posits are as good as ﬂoatsin inference. The

Deep Positron

DNN architecture presentedin [15] shows trade-offs between performance and hardwareresources. The Deep Positron architecture uses an FPGA-basedsoft core to control the multiply-accumulate unit hardware(ﬁxed-point, ﬂoating-point, and posit). The

Cheetah frame-work presented in [16] incorporates mixed-precision arithmeticalongside support for the conventional formats.RISC-V integration of posit arithmetic hardwares are pre-sented in PERI [17], PERC [18], and Clarinet [19]. PERIpresented in [17] uses SHAKTI C-class core as a base toattach posit arithmetic hardware, ﬁrst as a tightly-coupled unit,and then as an accelerator connected through rocket customcoprocessor (RoCC) interface. PERC presented in [18] delvesinto a similar aspects while using

RocketCore as a base. FluteRISC-V core from Bluespec Inc is used for posit arithmeticexperimentation in Clarinet [19] where Melodica is the tightly-coupled posit core. Clarinet has a unique feature that itsupports the quire register as well for exact dot products; fusedmultiply-accumulation is a special case of the accumulationin the quire register. There also exists a couple of commercialattempts, such as the CRISP core by Calligo Technologies [20]and VividSparks [21].Very few implementations in the literature focus onapplication-speciﬁc posit arithmetic tuning wherein extensive

Fig. 2: Optical ﬂow in consecutive frames in (a) syntheticimages, and (b) real-life images. Images on the left and rightside are the consecutive input images and the middle imagesrepresent the optical ﬂow. None of the images are manipulatedto support any particular number format.analyses are performed before delving into the hardware de-signs. In our approach, we ﬁrst emphasize application analysesfollowed by RISC-V integration of posit arithmetic unit. Forour implementation of a posit adder and multiplier, we haveused the improvised implementations of the designs proposedin [5], and for the divider, we have used the design presentedin [7].III. L U K A USING S OFT P OSIT AND S OFT F LOAT

The study is conducted with synthetic images of a sphere(slightly rotated in each successive frame) and real-life imagesa human being (slightly translated in each successive frame),as shown in Fig. 2a and 2b. It is ensured that the imagesare well textured and the motion is very small for consecu-tive frames, eliminating the need to use regularization-basedmethods or multi-scale estimation.To perform the error analysis, LuKa is implemented in theC programming language. We ensure that the implementationhas no dependency on any third-party or open source libraries.The reference implementation uses double precision ﬂoating-point arithmetic. The code is executed with all consecutivepairs of frames as inputs over the whole data set of imagesand the optical ﬂow values are obtained.

A. Accuracy analysis for -bit ﬂoats, ﬁxed-point, and posits A primary goal of this study was to compare low-precision( -bit) posit and ﬂoat result accuracy. We also test a -bit ﬁxed-point format. The grey-scale pixel values ( to )are ﬁrst scaled by dividing by norm ; we tested norm valuesranging from to . Each format has a preferred norm;for example, a too-small norm for ﬂoats leads to catastrophicoverﬂow in the matrix multiplication step of the algorithm,since the largest real value they can represent is . For all three formats, we compare the results with a refer-ence result. We pick the norm value that gives the smallestabsolute error. Heat maps of the absolute error for both u and v are generated to visualize the distribution of error (Fig. 3).The heat maps and data presented are for the errors in theoptical ﬂow between two particular frames selected from thesynthetic and real-life image data sets, representative for thewhole experiment.For the -bit (half-precision) ﬂoat study, we use theBerkeley SoftFloat library by John Hauser, which provides anexcellent stable software implementation of this precision thatconforms to the IEEE 754 Standard. All optical ﬂow values arecalculated with -bit ﬂoating-point variables and operations.For the -bit ﬂoats, the best norm value is found to be for both synthetic and real-life images (We discuss the causeand implication of this in detail later in this section).For the ﬁxed-point implementation, we take advantage ofthe libﬁxmath ﬁxed-point math library. As with SoftFloat, aQ16.16 implementation of the code is prepared and executedon the same input data set. Heat maps (Fig. 3) are generatedfor the best cases (norm factors of 28 and 18 for the syntheticand real-life images respectively).The posit implementation uses Cerlane Leong’s SoftPositlibrary. It supports two -bit conﬁgurations with es = 1 and es = 2 ; we found es = 2 the better ﬁt for this application. Thecode was ported with all variables, and operations changed toposits. The use of the quire , supported by the SoftPosit library,is out of the scope of this work, but in future work may furtherimprove the accuracy in the matrix multiplication step. Pixelvalues are again normalized. Optical ﬂow values and errorsare calculated as before. Norm values of and present thesmallest error for synthetic and real-life images respectively.Table I summarizes the results obtained. “posit 16, k ” refersto 16-bit posits with es = k . For the synthetic images, themaximum error for the ﬁxed-point format is an order ofmagnitude higher than the other formats. However, this is notthe case with RMSE which is very close to the RMSE for ﬂoat16, albeit ∼ × more than posit 16,2. This is also evident fromvisualizing the heat maps and conﬁrms that ﬁxed-point formatgives mostly accurate values with few of very high absoluteerror. Posit 16,2 has ∼ × lower maximum and RMS errorscompared to ﬂoat 16 while posit 16,1 has an error proﬁleintermediate to the two formats.For the real-life images, results are slightly different. Itshould be noted that no additional ﬁlters were applied to theimages before the optical ﬂow calculations and they wereextracted from video as-is. They lack the texture and sharpnessof synthetic images and are noisier in general. It is found thatboth the maximum and RMS errors for ﬁxed-point format inTABLE I: Absolute errors in optical ﬂow Fixed16.16 Float 16 Posit 16,1 Posit 16,2Max Error (synthetic) 0.01579 0.0047 0.00272 0.00163RMS Error (synthetic) 0.00057 0.00049 0.00016 0.00015Std. Deviation (synthetic) 0.00056 0.00046 0.00015 0.00046Max Error (real-life) 5.6692 0.125 0.13412 0.08333RMS Error (real-life) 0.12940 0.00109 0.00234 0.00108Std. Deviation (real-life) 0.12885 0.00108 0.00233 0.00107

Fig. 3: Error heat maps for synthetic ((a), (b), and (c)) and real-life ((d), (e), and (f)) images in y and x for ﬁxed-point, ﬂoat,and posit formats (IEEE 754-2008 64-bit reference) (a) (b) Fig. 4: Trend in accuracy for different normalization factors in (a) synthetic images and (b) real-life imagesthis case are two orders of magnitude higher compared to ﬂoatsand posits. Float 16 performs equivalent to posit 16,2 andbetter than posit 16,1 in terms of RMS error, although, themax error for ﬂoat is ∼ . × the max error for posit 16,2.The summary in Table I presents the best-in-class results,but for a more generic view of the performance of the formats,the max errors are plotted against the normalization factors inFig. 4 in the form of bar charts. Normalizing by givesvery high accuracy for posits in both data sets (best forsynthetic and next-best for real-life). In other words, scalingthe original pixel values from ( – ) to ( – ) leads tofurther improvement in result accuracy. This is because of the tapered accuracy property of posits; accuracy is maximizedfor values close to in magnitude. Dividing by centers the(nonzero) pixel values x in the range (cid:54) x < . Posit 16,2has its maximum accuracy in exactly this range, signiﬁcantbits. Float 16 is consistently less accurate than posit 16,2 andﬁxed-point is consistently less accurate than both ﬂoats andposits. The red bars in Fig. 4 (b) indicate NaN ﬂoat valuesthat are generated for norm values in the range of 1 to 8 thatare too small to prevent overﬂow. (Posit 16,2 can representreal values up to about . × .)Next, we delve deeper into the ﬂoat 16 and posit 16,2 formats to understand why ﬂoat 16 performs so well in certainregions (such as norm = 32 for ﬂoats). Data values gener-ated from each and every intermediate arithmetic operationperformed in the LuKa algorithm is collected in the referenceimplementation for norm values of (scaling pixels to rangefrom to ) and (scaling pixels to range from to ).This is done for both synthetic and real-life images. Fromthis intermediate data, all the unique values are extracted andanalyzed. It is found that normalizing by limits the dynamicrange of the data values generated during the calculation,bringing them within the dynamic range of ﬂoat 16 (Fig. 5and Fig. 6). Fig. 5 and 6 also present overlapped histogramsof ﬂoat 16, posit 16,2 and the unique data values generated. Agood overlap entails a better number system for the applicationin hand. Posits have a far wider dynamic range than ﬂoats andhence perform better in general across all norm factors. For thenorm factor of where ﬂoat 16 has adequate dynamic range,the tapered nature of data (with high density of values around ) gives a slight edge to posits resulting in a marginally lowererror at that norm, though not as low as posits using theiroptimum norm. Fig. 5 also shows that a relatively smallererror in the larger data values carries more weight in the ﬁnalresult accuracy than the larger error in smaller data values. Fig. 5: Histogram overlap of posits, ﬂoats, and unique datavalues generated during reference (double precision) imple-mentation run of LuKa with normalization factors of (a) 255and (b) 32 for synthetic imagesHowever, a deeper study with more applications is neededto substantiate this claim. This study shows the advantages ofusing posits over other formats for calculating the optical ﬂowusing the LuKa method.IV. H

ARDWARE IMPLEMENTATION OF L U K A We integrate a veriﬁed posit adder and multiplier to theRI5CY core of the

Pulpino platform [9]. We disintegratethe existing ﬂoating-point unit (FPU) from the RI5CY coreand integrate the posit arithmetic unit (PAU) generated adderand multiplier in the core, as shown in Fig. 7. RI5CY isa 32-bit core based on RISC-V ISA supporting ﬂoating-point instructions. We generate a 32-bit adder and multiplierTABLE II: Adder, multiplier synthesis results (delays in ns)

Adder Multiplier( n , es ) LUT LogicDelay NetDelay LUT LogicDelay NetDelay(8,0) 185 (0) 8.83 21.12 95 (1) 7.43 13.13(8,2) 181 (0) 9.68 20.92 96 (0) 4.28 14.07(16,1) 400 (0) 12.77 19.01 229 (1) 10.16 13.55(16,2) 391 (0) 14.78 20.07 226 (1) 10.76 13.09(32,2) 866 (0) 17.30 24.57 572 (4) 15.55 16.38 TABLE III: FPGA synthesis results for FPU and Posit cores

LUT Count Delay (ns)FPU [9] 2669 50 (20 MHz)Posit (16,1) 2082 55 (18.18 MHz)Posit (16,2) 2024 42 (23.81 MHz)Posit (28,2) 2780 71 (14.08 MHz)Posit (32,2) 2810 71 (14.08 MHz)

Fig. 6: Histogram overlap of posits, ﬂoats, and unique datavalues generated during reference (double precision) imple-mentation run of LuKa with normalization factors of (a) 255and (b) 32 for real-life imagesFig. 7: RISC-V integration of posit coreusing PAU for the integration. The developed parametric posithardware generator allows us to choose any posit conﬁguration( n or es ) and generate adder , multiplier , integer to positconverter (int2pos) and posit to integer converter (pos2int)hardware operators. The PAU has been exhaustively testedagainst the SoftPosit library for ( n, es ) = (8 , , (7 , , (8 , , (9 , , (10 , , (11 , , and (12 , conﬁgurations. Furthermore,we have also tested for (16 , conﬁguration for ∼ . billioncombinations ( ≈ ), mainly covering the corner cases.Based on the conclusions obtained from the optical ﬂowstudy, we synthesize and integrate a (16 , PAU core withPulpino. The results obtained post-integration are shown inTable III. The baseline version of the RI5CY core is theversion with the native IEEE 754 FPU. Switching to the (16 , conﬁgured PAU core affords enormous savings in data RAMusage at a tolerable loss in accuracy for our optical ﬂowapplication. Integration results for other conﬁgurations of theposit core are also provided for reference. The table also showsthe FPGA delay for various conﬁgurations of the PAU core. -bit versions show a delay of ns and ns, comparableto the ns achieved by the baseline FPU from Pulpino.Table II presents the detailed synthesis results of the PAUadder and multiplier. This PAU is synthesized for Zedboardwith Xilinx Zynq-7000 SoC. Vivado 2018.3 is used for theFPGA synthesis results. Both the adder and multiplier im-plementations are purely combinational in nature and withoutany pipelining. The DSP counts are given in parenthesesnext to LUT counts for all the conﬁgurations. In general,reasonably good area and delay numbers are observed. Webenchmark our PAU against the other published results onposit hardware in Table IV. NS marks conﬁgurations thatare not synthesizable, and NR signiﬁes not reported in thepaper. Again, the DSP counts are given in parentheses nextto the LUT counts for multiplier. The adders do not use anyDSP blocks across all the implementations. The LUT countof this work shows a signiﬁcant improvement over existingparametric posit hardware generators [7], [5]. It is also moreextensively tested compared to these previous works. [22]shows lower area footprint and is a good candidate for ourfuture implementations. To the best of our knowledge, [22]and this work are the best published implementations of a parametric posit hardware generator.V. C ONCLUSION

The purpose of this work is to analyze the beneﬁts andshortcomings of posit arithmetic over ﬂoats in real-worldapplications. We have demonstrated the clear advantage ofusing posits instead of ﬂoats for calculating optical ﬂow usingLuKa. An order of magnitude improvement in accuracy isobserved when the algorithm is implemented using positsinstead of ﬂoats in synthetic images. In contrast, for real-lifeimages, the accuracy is comparable. A ﬁxed-point approachhas accuracy too low to be viable. The algorithm was thenfurther implemented in hardware on a RISC-V core that hasbeen modiﬁed to support posit 16,1 and 32,2. The synthesisTABLE IV: LUT count comparison

Adder Multiplier( n , es ) Ours [5] [7] [22] Ours [5] [7] [22](8,0) 185 NS NS NR 95(1) NS NS NR(8,2) 181 208 196 NR 96(0) 131(0) 123(0) NR(16,1) 400 391 460 320 229(1) 218(1) 271(1) 253 (1)(16,2) 391 404 492 NR 226(1) 223(1) 272(1) NR(32,2) 866 981 1115 745 572(4) 572(4) 648(4) 469 (4) results of the modiﬁed core, as well as the Posit ArithmeticUnit, were presented; Pulpino performs well with a lower LUTcount than the single-precision FPU. Our PAU is also shownto be comparable (if not better) in terms of area, to otherstate-of-the-art posit hardware.R EFERENCES[1] J. Gustafson and I. Yonemoto, “Beating ﬂoating point at its own game:Posit arithmetic,”

Supercomput. Front. Innov.: Int. J. , vol. 4, no. 2, pp.71–86, June 2017.[2] Manish Kumar Jaiswal and Hayden Kwok-Hay So, “Universal numberposit arithmetic generator on FPGA,” in

DATE 2018, Dresden, Germany,March 19-23, 2018 , 2018, pp. 1159–1162.[3] A. Guntoro, C. De La Parra, F. Merchant, F. De Dinechin, J. L.Gustafson, M. Langhammer, R. Leupers, and S. Nambiar, “Nextgeneration arithmetic for edge computing,” in , 2020, pp. 1357–1365.[4] M. K. Jaiswal and H. So, “Architecture generator for type-3 unum positadder/subtractor,” in (ISCAS) 2018 , May 2018, pp. 1–5.[5] R. Chaurasiya, J. Gustafson, R. Shrestha, J. Neudorfer, S. Nambiar,K. Niyogi, F. Merchant, and R. Leupers, “Parameterized posit arithmetichardware generator,” in

ICCD 2018 , pp. 334–341.[6] Suresh Nambi, Salim Ullah, Aditya Lohana, Siva Satyendra Sahoo,Farhad Merchant, and Akash Kumar, “Expan(n)d: Exploring posits forefﬁcient artiﬁcial neural network design in fpga-based systems,” arXiv2020.[7] M. K. Jaiswal and H. K. . So, “PACoGen: A hardware posit arithmeticcore generator,”

IEEE Access , vol. 7, pp. 74586–74601, 2019.[8] Berthold K.P. Horn and Brian G. Schunck, “Determining optical ﬂow,”Tech. Rep., Cambridge, MA, USA, 1980.[9] M. Gautschi, P. D. Schiavone, A. Traber, I. Loi, A. Pullini, D. Rossi,E. Flamand, F. K. G¨urkaynak, and L. Benini, “Near-threshold risc-v corewith dsp extensions for scalable iot endpoint devices,”

IEEE TVLSI , vol.25, no. 10, pp. 2700–2713, Oct 2017.[10] Ihsen Alouani, Anouar BEN KHALIFA, Farhad Merchant, and RainerLeupers, “An investigation on inherent robustness of posit data represen-tation,” in

Proceedings of the International Conference on VLSI Design(VLSID) , Feb. 2021.[11] H. Seong, C. E. Rhee, and H. Lee, “A novel hardware architecture ofthe lucas–kanade optical ﬂow for reduced frame memory access,”

IEEETCSVT , vol. 26, no. 6, pp. 1187–1199, June 2016.[12] H. Zhang and S. Ko, “Design of power efﬁcient posit multiplier,”

IEEETransactions on Circuits and Systems II: Express Briefs , vol. 67, no. 5,pp. 861–865, 2020.[13] Zachariah Carmichael, Hamed F. Langroudi, Char Khazanov, JeffreyLillie, John L. Gustafson, and Dhireesha Kudithipudi, “Performance-efﬁciency trade-off of low-precision numerical formats in deep neuralnetworks,” in

Proceedings of the Conference for Next GenerationArithmetic 2019 , New York, NY, USA, 2019, CoNGA’19, Associationfor Computing Machinery.[14] Ra´ul Murillo Montero, Alberto A. Del Barrio, and Guillermo Botella,“Template-based posit multiplication for training and inferring in neuralnetworks,” arXiv 2019.[15] Z. Carmichael, H. F. Langroudi, C. Khazanov, J. Lillie, J. L. Gustafson,and D. Kudithipudi, “Deep positron: A deep neural network using theposit number system,” in , 2019, pp. 1421–1426.[16] Hamed F. Langroudi, Zachariah Carmichael, David Pastuch, and Dhiree-sha Kudithipudi, “Cheetah: Mixed low-precision hardware & softwareco-design framework for dnns on the edge,” 2019.[17] Sugandha Tiwari, Neel Gala, Chester Rebeiro, and V. Kamakoti, “PERI:A posit enabled risc-v core,” 2019.[18] Arunkumar M V, Ganesh Bhairathi, and Harshal Hayatnagarkar, “Perc:Posit enhanced rocket chip,” 05 2020.[19] Riya Jain, Niraj Sharma, Farhad Merchant, Sachin Patkar, and RainerLeupers, “Clarinet: A risc-v based framework for posit arithmeticempiricism,” arXiv 2020.[20] Calligo Technologies, “Calligo risc-v with posits (crisp),”

CalligoTechnologies , 2020.[21] VividSparks, “Posit for next generation computer arithmetic,”

VividSparks , 2020.[22] Y. Uguen, L. Forget, and F. de Dinechin, “Evaluating the hardware costof the posit number system,” in