Brightening the Optical Flow through Posit Arithmetic
Vinay Saxena, Ankitha Reddy, Jonathan Neudorfer, John Gustafson, Sangeeth Nambiar, Rainer Leupers, Farhad Merchant
11 Preprint - Accepted in ISQED 2021
Brightening the Optical Flow through PositArithmetic
Vinay Saxena ∗ , Ankitha Reddy ∗ , Jonathan Neudorfer ∗ , John Gustafson ‡ ,Sangeeth Nambiar ∗ , Rainer Leupers † , Farhad Merchant †∗ Bosch Research and Technology Centre - India, Bangalore † Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, Germany ‡ National University of Singapore, Singapore { vinay.saxena, ankitha.reddy, jonathan.neudorfer, sangeeth.nambiar } @in.bosch.com, [email protected] { farhad.merchant, leupers } @ice.rwth-aachen.de Abstract —As new technologies are invented, their commercialviability needs to be carefully examined along with their technicalmerits and demerits. The posit TM data format, proposed as adrop-in replacement for IEEE 754 TM float format, is one suchinvention that requires extensive theoretical and experimentalstudy to identify products that can benefit from the advantagesof posits for specific market segments. In this paper, we presentan extensive empirical study of posit-based arithmetic vis-`a-visIEEE 754 compliant arithmetic for the optical flow estimationmethod called Lucas-Kanade (LuKa). First, we use SoftPosit and
SoftFloat format emulators to perform an empirical error analy-sis of the LuKa method. Our study shows that the average errorin LuKa with SoftPosit is an order of magnitude lower than LuKawith SoftFloat. We then present the integration of the hardwareimplementation of a posit adder and multiplier in a RISC-Vopen-source platform. We make several recommendations, alongwith the analysis of LuKa in the RISC-V context, for futuregeneration platforms incorporating posit arithmetic units.
Index Terms —Optical flow, computer arithmetic, posits,floating-point, Lucas-Kanade algorithm
I. I
NTRODUCTION
The posit data type is proposed as a drop-in replacementfor IEEE 754 compliant floating-point format [1]. Posit formatoffers compelling advantages over IEEE 754 compliant floatformat, such as higher accuracy and wider dynamic range.For arithmetic operations, posits require simpler hardwarecompared to a fully-compliant implementation of IEEE 754floats [2][3]. It has been shown experimentally that an n -bitfloating-point adder/multiplier can be replaced by an m -bitposit adder/multiplier where m < n , without compromisingaccuracy and range [4][5]. This is due to greater information-per-bit in the posit data type compared to its IEEE-compliantcounterpart. Several researchers around the world are workingon the efficient realization of posit arithmetic units; studiesof posit arithmetic for different application domains havebeen published [6][7]. The SoftPosit emulation library supportsfloat-like arithmetic operations with different posit configura-tions and is closely patterned after the
SoftFloat library fromBerkeley. We believe the time has arrived to apply SoftPositand SoftFloat to analyze the merits of posits versus floats forwidely-used commercial applications.Since the inception of posit data representation, there havebeen several implementations in the literature of posit arith-metic operations. The early and open-source hardware imple- mentations of a posit adder and multiplier were presented in[2] and [4]. In [4], the authors covered the design of a paramet-ric adder/subtractor while in [2], the authors presented para-metric designs of float-to-posit and posit-to-float converters,and a multiplier along with the design of an adder/subtractor.A major disadvantage of the designs presented in [2] and [4]is that the designs are yet to be fully verified and containmultiple errors. The PACoGen open-source framework that cangenerate a pipelined adder/subtractor, multiplier, and divider ispresented in [7]. The design presented in [7] has a disadvan-tage that of not synthesizing for the exponent size zero, andhence cannot be considered a fully parametric implementation.A more complete implementation of a parametric posit adderand multiplier generator is presented in [5].
Optical flow is caused by the relative motion of an observerand a scene that has objects in motion. Out of several methodsin the literature, we choose the Lucas-Kanade (LuKa) methodfor our experiments due to its simplicity and computationalintensity [8].Recently, the open-source instruction set architecture (ISA)called RISC-V has gained a following in industry andacademia. We integrated a posit adder and multiplier withthe RI5CY core [9] to create a posit-enabled RISC-V imple-mentation. We compare area and energy numbers for field-programmable gate array (FPGA) synthesis of a RI5CY corewith IEEE 754 compliant and with posit arithmetic. The majorcontributions in this paper are as follows: • A detailed empirical study of LuKa using SoftPosit andSoftFloat where we compare numerical accuracy in LuKafor posits and IEEE 754 compliant floats • RISC-V-based comparison of area and delay using positsversus fully-compliant IEEE 754 floats (to the best of ourknowledge, this is the first such study) • Performance analysis of LuKa on RISC-V with positand IEEE 754 compliant floats, and discussion of currentresearch issues in posit arithmeticThe rest of the paper is organized as follows: In Section II, wepresent an overview of IEEE 754-2019 format, posit numberformat, and the LuKa method along with the relevant literature.In Section III, accuracy analyses of LuKa using SoftFloat andSoftPosit are discussed in detail. A hardware implementationis presented in Section IV along with performance measure-ments. We summarize our conclusions in Section V. ©2021 IEEE a r X i v : . [ c s . A R ] J a n Fig. 1: Generic comparison of IEEE 754 floating-point ( float )and posit number formats for non-exception valuesII. B
ACKGROUND
A. IEEE 754 Compliant and Posit Number Systems
The IEEE 754-2019 binary floating-point format numbershave three parts for normal floats: a sign, an exponent, anda fraction (see Fig. 1). The sign is the most significant bitand indicates whether the number is positive or negative. Insingle precision, the next bits represent the exponent of thebinary number ranging from − to . The remaining bits represent the fractional part. The format is: val = ( − sign × exp − bias × (1 . fraction ) (1)When the exponent bits express the minimum (all bits)or maximum (all bits), an exception value is indicated. It iscurrently common for vendors to claim IEEE 754 compliancein their hardware while actually complying only for the case ofnormal floats. Full IEEE 754 compliance for exception cases,deemed to be rare, is seldom supported in hardware; instead,traps to software or microcode are used. This approach de-grades both performance and security; data-dependent timingcreates a side-channel security hole. Posit arithmetic was proposed as a drop-in replacement forIEEE 754 arithmetic in 2017 [1]. Posit arithmetic has severaladvantages over IEEE 754 arithmetic: higher accuracy for themost commonly-used values, simpler hardware implementa-tion, smaller chip area, and lower energy cost [5] [10]. UnlikeIEEE 754 floats, there are no subnormal posit numbers, nor isthere any need for them; | x − y | produces a zero result if andonly if x = y . There are only two exception cases: zero andnot-a-real (NaR). For all other cases, the value val of a positis given by val =( − sign × useed k × exp × (1 + fn − (cid:88) i =1 b fn − − i − i ) (2)The regime indicates a scale factor of useed k where useed =2 es and es is the exponent size. The numerical value of k isdetermined by the run length of or bits in the string ofregime bits. Run-length encoding of the regime automaticallyallows more fraction bits for the more common values forwhich magnitudes are closer to 1, and thus provides taperedaccuracy in a bit-efficient way that preserves ordering. Furtherdetails about the posit number format and posit arithmetic canbe found in [1]. The posit format is depicted in Fig. 1. B. Lucas-Kanade Method
Despite its limitations in determining optical flow informa-tion in uniform images, the LuKa technique and its variantsare the widely used methods for estimation of the optical flowin commercial products [11]. Suppose I is the brightness ofthe pixel at position ( x ( t ) , y ( t )) at time t . We wish to solve I x u x + I y u y + I t = 0 (3) where I x , I y , and I t represent the x , y , and t directionalgradients, respectively, and u x and u y represent the opticalflow to calculate. To solve this equation, a local smoothness constraint is added, which assumes the change in u x and u y in a small neighborhood of pixels to be extremely small. Thefinal vector (cid:126)u containing the flow components is obtained fromthe equation (cid:126)u = ( A T A ) − A T B (4)where A is the directional derivative matrix of the image and B is the time derivative vector. The derivatives here are simpledeltas from one image to the next with a resolution of 1/255.Since, the matrix A T A is a × matrix, we use Cramer’srule for the matrix inversion. C. Related Work
There have been several attempts of posit implementationsince the first proposal. The early parameterized designs werepresented in [2], [4], [7], and [5]. The designs presentedin [2], [4], and [7] are open-source but do not synthesize forexponent size zero while the design presented in [5] is notopen-source. A power-efficient posit multiplier is presentedin [12]. The authors in [12] present a scheme where they dividethe fraction part of the multiplier into several chunks and usethem efficiently resulting in 16% power efficiency over thebase-line implementation.Several posit implementations are explicitly focused onmachine learning applications. Performance-efficiency trade-off for deep neural network (DNN) inference is presentedin [13]. The authors have discussed overall neural networkefficiency and performance trade-offs in [13]. A template-based posit multiplier is presented in [14] where authors haveincorporated training and inference of the neural networks.Authors have shown that 8-bit posits are as good as floatsin inference. The
Deep Positron
DNN architecture presentedin [15] shows trade-offs between performance and hardwareresources. The Deep Positron architecture uses an FPGA-basedsoft core to control the multiply-accumulate unit hardware(fixed-point, floating-point, and posit). The
Cheetah frame-work presented in [16] incorporates mixed-precision arithmeticalongside support for the conventional formats.RISC-V integration of posit arithmetic hardwares are pre-sented in PERI [17], PERC [18], and Clarinet [19]. PERIpresented in [17] uses SHAKTI C-class core as a base toattach posit arithmetic hardware, first as a tightly-coupled unit,and then as an accelerator connected through rocket customcoprocessor (RoCC) interface. PERC presented in [18] delvesinto a similar aspects while using
RocketCore as a base. FluteRISC-V core from Bluespec Inc is used for posit arithmeticexperimentation in Clarinet [19] where Melodica is the tightly-coupled posit core. Clarinet has a unique feature that itsupports the quire register as well for exact dot products; fusedmultiply-accumulation is a special case of the accumulationin the quire register. There also exists a couple of commercialattempts, such as the CRISP core by Calligo Technologies [20]and VividSparks [21].Very few implementations in the literature focus onapplication-specific posit arithmetic tuning wherein extensive
Fig. 2: Optical flow in consecutive frames in (a) syntheticimages, and (b) real-life images. Images on the left and rightside are the consecutive input images and the middle imagesrepresent the optical flow. None of the images are manipulatedto support any particular number format.analyses are performed before delving into the hardware de-signs. In our approach, we first emphasize application analysesfollowed by RISC-V integration of posit arithmetic unit. Forour implementation of a posit adder and multiplier, we haveused the improvised implementations of the designs proposedin [5], and for the divider, we have used the design presentedin [7].III. L U K A USING S OFT P OSIT AND S OFT F LOAT
The study is conducted with synthetic images of a sphere(slightly rotated in each successive frame) and real-life imagesa human being (slightly translated in each successive frame),as shown in Fig. 2a and 2b. It is ensured that the imagesare well textured and the motion is very small for consecu-tive frames, eliminating the need to use regularization-basedmethods or multi-scale estimation.To perform the error analysis, LuKa is implemented in theC programming language. We ensure that the implementationhas no dependency on any third-party or open source libraries.The reference implementation uses double precision floating-point arithmetic. The code is executed with all consecutivepairs of frames as inputs over the whole data set of imagesand the optical flow values are obtained.
A. Accuracy analysis for -bit floats, fixed-point, and posits A primary goal of this study was to compare low-precision( -bit) posit and float result accuracy. We also test a -bit fixed-point format. The grey-scale pixel values ( to )are first scaled by dividing by norm ; we tested norm valuesranging from to . Each format has a preferred norm;for example, a too-small norm for floats leads to catastrophicoverflow in the matrix multiplication step of the algorithm,since the largest real value they can represent is . For all three formats, we compare the results with a refer-ence result. We pick the norm value that gives the smallestabsolute error. Heat maps of the absolute error for both u and v are generated to visualize the distribution of error (Fig. 3).The heat maps and data presented are for the errors in theoptical flow between two particular frames selected from thesynthetic and real-life image data sets, representative for thewhole experiment.For the -bit (half-precision) float study, we use theBerkeley SoftFloat library by John Hauser, which provides anexcellent stable software implementation of this precision thatconforms to the IEEE 754 Standard. All optical flow values arecalculated with -bit floating-point variables and operations.For the -bit floats, the best norm value is found to be for both synthetic and real-life images (We discuss the causeand implication of this in detail later in this section).For the fixed-point implementation, we take advantage ofthe libfixmath fixed-point math library. As with SoftFloat, aQ16.16 implementation of the code is prepared and executedon the same input data set. Heat maps (Fig. 3) are generatedfor the best cases (norm factors of 28 and 18 for the syntheticand real-life images respectively).The posit implementation uses Cerlane Leong’s SoftPositlibrary. It supports two -bit configurations with es = 1 and es = 2 ; we found es = 2 the better fit for this application. Thecode was ported with all variables, and operations changed toposits. The use of the quire , supported by the SoftPosit library,is out of the scope of this work, but in future work may furtherimprove the accuracy in the matrix multiplication step. Pixelvalues are again normalized. Optical flow values and errorsare calculated as before. Norm values of and present thesmallest error for synthetic and real-life images respectively.Table I summarizes the results obtained. “posit 16, k ” refersto 16-bit posits with es = k . For the synthetic images, themaximum error for the fixed-point format is an order ofmagnitude higher than the other formats. However, this is notthe case with RMSE which is very close to the RMSE for float16, albeit ∼ × more than posit 16,2. This is also evident fromvisualizing the heat maps and confirms that fixed-point formatgives mostly accurate values with few of very high absoluteerror. Posit 16,2 has ∼ × lower maximum and RMS errorscompared to float 16 while posit 16,1 has an error profileintermediate to the two formats.For the real-life images, results are slightly different. Itshould be noted that no additional filters were applied to theimages before the optical flow calculations and they wereextracted from video as-is. They lack the texture and sharpnessof synthetic images and are noisier in general. It is found thatboth the maximum and RMS errors for fixed-point format inTABLE I: Absolute errors in optical flow Fixed16.16 Float 16 Posit 16,1 Posit 16,2Max Error (synthetic) 0.01579 0.0047 0.00272 0.00163RMS Error (synthetic) 0.00057 0.00049 0.00016 0.00015Std. Deviation (synthetic) 0.00056 0.00046 0.00015 0.00046Max Error (real-life) 5.6692 0.125 0.13412 0.08333RMS Error (real-life) 0.12940 0.00109 0.00234 0.00108Std. Deviation (real-life) 0.12885 0.00108 0.00233 0.00107
Fig. 3: Error heat maps for synthetic ((a), (b), and (c)) and real-life ((d), (e), and (f)) images in y and x for fixed-point, float,and posit formats (IEEE 754-2008 64-bit reference) (a) (b) Fig. 4: Trend in accuracy for different normalization factors in (a) synthetic images and (b) real-life imagesthis case are two orders of magnitude higher compared to floatsand posits. Float 16 performs equivalent to posit 16,2 andbetter than posit 16,1 in terms of RMS error, although, themax error for float is ∼ . × the max error for posit 16,2.The summary in Table I presents the best-in-class results,but for a more generic view of the performance of the formats,the max errors are plotted against the normalization factors inFig. 4 in the form of bar charts. Normalizing by givesvery high accuracy for posits in both data sets (best forsynthetic and next-best for real-life). In other words, scalingthe original pixel values from ( – ) to ( – ) leads tofurther improvement in result accuracy. This is because of the tapered accuracy property of posits; accuracy is maximizedfor values close to in magnitude. Dividing by centers the(nonzero) pixel values x in the range (cid:54) x < . Posit 16,2has its maximum accuracy in exactly this range, significantbits. Float 16 is consistently less accurate than posit 16,2 andfixed-point is consistently less accurate than both floats andposits. The red bars in Fig. 4 (b) indicate NaN float valuesthat are generated for norm values in the range of 1 to 8 thatare too small to prevent overflow. (Posit 16,2 can representreal values up to about . × .)Next, we delve deeper into the float 16 and posit 16,2 formats to understand why float 16 performs so well in certainregions (such as norm = 32 for floats). Data values gener-ated from each and every intermediate arithmetic operationperformed in the LuKa algorithm is collected in the referenceimplementation for norm values of (scaling pixels to rangefrom to ) and (scaling pixels to range from to ).This is done for both synthetic and real-life images. Fromthis intermediate data, all the unique values are extracted andanalyzed. It is found that normalizing by limits the dynamicrange of the data values generated during the calculation,bringing them within the dynamic range of float 16 (Fig. 5and Fig. 6). Fig. 5 and 6 also present overlapped histogramsof float 16, posit 16,2 and the unique data values generated. Agood overlap entails a better number system for the applicationin hand. Posits have a far wider dynamic range than floats andhence perform better in general across all norm factors. For thenorm factor of where float 16 has adequate dynamic range,the tapered nature of data (with high density of values around ) gives a slight edge to posits resulting in a marginally lowererror at that norm, though not as low as posits using theiroptimum norm. Fig. 5 also shows that a relatively smallererror in the larger data values carries more weight in the finalresult accuracy than the larger error in smaller data values. Fig. 5: Histogram overlap of posits, floats, and unique datavalues generated during reference (double precision) imple-mentation run of LuKa with normalization factors of (a) 255and (b) 32 for synthetic imagesHowever, a deeper study with more applications is neededto substantiate this claim. This study shows the advantages ofusing posits over other formats for calculating the optical flowusing the LuKa method.IV. H
ARDWARE IMPLEMENTATION OF L U K A We integrate a verified posit adder and multiplier to theRI5CY core of the
Pulpino platform [9]. We disintegratethe existing floating-point unit (FPU) from the RI5CY coreand integrate the posit arithmetic unit (PAU) generated adderand multiplier in the core, as shown in Fig. 7. RI5CY isa 32-bit core based on RISC-V ISA supporting floating-point instructions. We generate a 32-bit adder and multiplierTABLE II: Adder, multiplier synthesis results (delays in ns)
Adder Multiplier( n , es ) LUT LogicDelay NetDelay LUT LogicDelay NetDelay(8,0) 185 (0) 8.83 21.12 95 (1) 7.43 13.13(8,2) 181 (0) 9.68 20.92 96 (0) 4.28 14.07(16,1) 400 (0) 12.77 19.01 229 (1) 10.16 13.55(16,2) 391 (0) 14.78 20.07 226 (1) 10.76 13.09(32,2) 866 (0) 17.30 24.57 572 (4) 15.55 16.38 TABLE III: FPGA synthesis results for FPU and Posit cores
LUT Count Delay (ns)FPU [9] 2669 50 (20 MHz)Posit (16,1) 2082 55 (18.18 MHz)Posit (16,2) 2024 42 (23.81 MHz)Posit (28,2) 2780 71 (14.08 MHz)Posit (32,2) 2810 71 (14.08 MHz)
Fig. 6: Histogram overlap of posits, floats, and unique datavalues generated during reference (double precision) imple-mentation run of LuKa with normalization factors of (a) 255and (b) 32 for real-life imagesFig. 7: RISC-V integration of posit coreusing PAU for the integration. The developed parametric posithardware generator allows us to choose any posit configuration( n or es ) and generate adder , multiplier , integer to positconverter (int2pos) and posit to integer converter (pos2int)hardware operators. The PAU has been exhaustively testedagainst the SoftPosit library for ( n, es ) = (8 , , (7 , , (8 , , (9 , , (10 , , (11 , , and (12 , configurations. Furthermore,we have also tested for (16 , configuration for ∼ . billioncombinations ( ≈ ), mainly covering the corner cases.Based on the conclusions obtained from the optical flowstudy, we synthesize and integrate a (16 , PAU core withPulpino. The results obtained post-integration are shown inTable III. The baseline version of the RI5CY core is theversion with the native IEEE 754 FPU. Switching to the (16 , configured PAU core affords enormous savings in data RAMusage at a tolerable loss in accuracy for our optical flowapplication. Integration results for other configurations of theposit core are also provided for reference. The table also showsthe FPGA delay for various configurations of the PAU core. -bit versions show a delay of ns and ns, comparableto the ns achieved by the baseline FPU from Pulpino.Table II presents the detailed synthesis results of the PAUadder and multiplier. This PAU is synthesized for Zedboardwith Xilinx Zynq-7000 SoC. Vivado 2018.3 is used for theFPGA synthesis results. Both the adder and multiplier im-plementations are purely combinational in nature and withoutany pipelining. The DSP counts are given in parenthesesnext to LUT counts for all the configurations. In general,reasonably good area and delay numbers are observed. Webenchmark our PAU against the other published results onposit hardware in Table IV. NS marks configurations thatare not synthesizable, and NR signifies not reported in thepaper. Again, the DSP counts are given in parentheses nextto the LUT counts for multiplier. The adders do not use anyDSP blocks across all the implementations. The LUT countof this work shows a significant improvement over existingparametric posit hardware generators [7], [5]. It is also moreextensively tested compared to these previous works. [22]shows lower area footprint and is a good candidate for ourfuture implementations. To the best of our knowledge, [22]and this work are the best published implementations of a parametric posit hardware generator.V. C ONCLUSION
The purpose of this work is to analyze the benefits andshortcomings of posit arithmetic over floats in real-worldapplications. We have demonstrated the clear advantage ofusing posits instead of floats for calculating optical flow usingLuKa. An order of magnitude improvement in accuracy isobserved when the algorithm is implemented using positsinstead of floats in synthetic images. In contrast, for real-lifeimages, the accuracy is comparable. A fixed-point approachhas accuracy too low to be viable. The algorithm was thenfurther implemented in hardware on a RISC-V core that hasbeen modified to support posit 16,1 and 32,2. The synthesisTABLE IV: LUT count comparison
Adder Multiplier( n , es ) Ours [5] [7] [22] Ours [5] [7] [22](8,0) 185 NS NS NR 95(1) NS NS NR(8,2) 181 208 196 NR 96(0) 131(0) 123(0) NR(16,1) 400 391 460 320 229(1) 218(1) 271(1) 253 (1)(16,2) 391 404 492 NR 226(1) 223(1) 272(1) NR(32,2) 866 981 1115 745 572(4) 572(4) 648(4) 469 (4) results of the modified core, as well as the Posit ArithmeticUnit, were presented; Pulpino performs well with a lower LUTcount than the single-precision FPU. Our PAU is also shownto be comparable (if not better) in terms of area, to otherstate-of-the-art posit hardware.R EFERENCES[1] J. Gustafson and I. Yonemoto, “Beating floating point at its own game:Posit arithmetic,”
Supercomput. Front. Innov.: Int. J. , vol. 4, no. 2, pp.71–86, June 2017.[2] Manish Kumar Jaiswal and Hayden Kwok-Hay So, “Universal numberposit arithmetic generator on FPGA,” in
DATE 2018, Dresden, Germany,March 19-23, 2018 , 2018, pp. 1159–1162.[3] A. Guntoro, C. De La Parra, F. Merchant, F. De Dinechin, J. L.Gustafson, M. Langhammer, R. Leupers, and S. Nambiar, “Nextgeneration arithmetic for edge computing,” in , 2020, pp. 1357–1365.[4] M. K. Jaiswal and H. So, “Architecture generator for type-3 unum positadder/subtractor,” in (ISCAS) 2018 , May 2018, pp. 1–5.[5] R. Chaurasiya, J. Gustafson, R. Shrestha, J. Neudorfer, S. Nambiar,K. Niyogi, F. Merchant, and R. Leupers, “Parameterized posit arithmetichardware generator,” in
ICCD 2018 , pp. 334–341.[6] Suresh Nambi, Salim Ullah, Aditya Lohana, Siva Satyendra Sahoo,Farhad Merchant, and Akash Kumar, “Expan(n)d: Exploring posits forefficient artificial neural network design in fpga-based systems,” arXiv2020.[7] M. K. Jaiswal and H. K. . So, “PACoGen: A hardware posit arithmeticcore generator,”
IEEE Access , vol. 7, pp. 74586–74601, 2019.[8] Berthold K.P. Horn and Brian G. Schunck, “Determining optical flow,”Tech. Rep., Cambridge, MA, USA, 1980.[9] M. Gautschi, P. D. Schiavone, A. Traber, I. Loi, A. Pullini, D. Rossi,E. Flamand, F. K. G¨urkaynak, and L. Benini, “Near-threshold risc-v corewith dsp extensions for scalable iot endpoint devices,”
IEEE TVLSI , vol.25, no. 10, pp. 2700–2713, Oct 2017.[10] Ihsen Alouani, Anouar BEN KHALIFA, Farhad Merchant, and RainerLeupers, “An investigation on inherent robustness of posit data represen-tation,” in
Proceedings of the International Conference on VLSI Design(VLSID) , Feb. 2021.[11] H. Seong, C. E. Rhee, and H. Lee, “A novel hardware architecture ofthe lucas–kanade optical flow for reduced frame memory access,”
IEEETCSVT , vol. 26, no. 6, pp. 1187–1199, June 2016.[12] H. Zhang and S. Ko, “Design of power efficient posit multiplier,”
IEEETransactions on Circuits and Systems II: Express Briefs , vol. 67, no. 5,pp. 861–865, 2020.[13] Zachariah Carmichael, Hamed F. Langroudi, Char Khazanov, JeffreyLillie, John L. Gustafson, and Dhireesha Kudithipudi, “Performance-efficiency trade-off of low-precision numerical formats in deep neuralnetworks,” in
Proceedings of the Conference for Next GenerationArithmetic 2019 , New York, NY, USA, 2019, CoNGA’19, Associationfor Computing Machinery.[14] Ra´ul Murillo Montero, Alberto A. Del Barrio, and Guillermo Botella,“Template-based posit multiplication for training and inferring in neuralnetworks,” arXiv 2019.[15] Z. Carmichael, H. F. Langroudi, C. Khazanov, J. Lillie, J. L. Gustafson,and D. Kudithipudi, “Deep positron: A deep neural network using theposit number system,” in , 2019, pp. 1421–1426.[16] Hamed F. Langroudi, Zachariah Carmichael, David Pastuch, and Dhiree-sha Kudithipudi, “Cheetah: Mixed low-precision hardware & softwareco-design framework for dnns on the edge,” 2019.[17] Sugandha Tiwari, Neel Gala, Chester Rebeiro, and V. Kamakoti, “PERI:A posit enabled risc-v core,” 2019.[18] Arunkumar M V, Ganesh Bhairathi, and Harshal Hayatnagarkar, “Perc:Posit enhanced rocket chip,” 05 2020.[19] Riya Jain, Niraj Sharma, Farhad Merchant, Sachin Patkar, and RainerLeupers, “Clarinet: A risc-v based framework for posit arithmeticempiricism,” arXiv 2020.[20] Calligo Technologies, “Calligo risc-v with posits (crisp),”
CalligoTechnologies , 2020.[21] VividSparks, “Posit for next generation computer arithmetic,”
VividSparks , 2020.[22] Y. Uguen, L. Forget, and F. de Dinechin, “Evaluating the hardware costof the posit number system,” in