An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm
AAn efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm
Abstract:
Floating point multiplication is a crucial operation in high power computing applications such as image processing, signal processing etc. And also multiplication is the most time and power consuming operation. This paper proposes an efficient method for IEEE 754 floating point multiplication which gives a better implementation in terms of delay and power. A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm (Vedic Mathematics) is used to implement unsigned binary multiplier for mantissa multiplication. The multiplier is implemented using Verilog HDL, targeted on Spartan-3E and Virtex-4 FPGA. Keywords: fpga, Floating point multiplier, Vedic mathematics, Urdhva-Tiryagbhyam, Karatsuba I. I NTRODUCTION
Floating point multiplication units are an essential IP for modern multimedia and high performance computing such as graphics acceleration, signal processing, image processing etc. There are lot of effort is made over the past few decades to improve performance of floating point computations. Floating point units are not only complex, but also require more area and hence more power consuming as compared to fixed point multipliers. And the complexity of the floating point unit increases as accuracy becomes a major issue. IEEE 754 [1] support different floating point formats such as Single Precision format, Double Precision format, Quadruple Precision format etc. But as the precision increases, multiplier area, delay and power increases drastically. In the proposed paper, we present a new multiplication method which uses a combination of Karatsuba and Urdhva-Tiryagbhyam (Vedic Mathematics) algorithm for multiplication. This combination not only reduces delay, but also reduces the percentage increase in hardware as compared to conventional methods. IEEE 754 format specifies two different formats namely single precision and double precision format [1, 2]. Fig. 1 shows the different IEEE 754 floating point formats used commonly. The Single precision format is of 32-bit wide and Double precision format is of 64-bit wide. The Most Significand Bit is the sign bit. The exponent is a signed integer. It is often represented as an unsigned value by adding a bias. In Single precision format, the exponent is of 8-bit wide and the bias is 127, i.e. the exponent has a range of (cid:4666)(cid:3398)127 (cid:1872)(cid:1867) 128(cid:4667) . In Double precision format, the exponent is of 11-bit wide and the bias is 1023, i.e. the exponent has a range of (cid:4666)(cid:3398)1023 (cid:1872)(cid:1867) 1024(cid:4667) . The mantissa or significand of Single precision format is of 23-bit and of double precision format is of 52 bit wide. The maximum value that can be represented using floating point format is (cid:1864)(cid:1853)(cid:1870)(cid:1859)(cid:1857)(cid:1871)(cid:1872) (cid:1871)(cid:1861)(cid:1859)(cid:1866)(cid:1861)(cid:1858)(cid:1861)(cid:1855)(cid:1853)(cid:1866)(cid:1856) (cid:3400) (cid:1854)(cid:1853)(cid:1871)(cid:1857) (cid:3039)(cid:3028)(cid:3045)(cid:3034)(cid:3032)(cid:3046)(cid:3047) (cid:3032)(cid:3051)(cid:3043)(cid:3042)(cid:3041)(cid:3032)(cid:3041)(cid:3047) . And the minimum value that can be represented is (cid:1871)(cid:1865)(cid:1853)(cid:1864)(cid:1864)(cid:1857)(cid:1871)(cid:1872) (cid:1871)(cid:1861)(cid:1859)(cid:1866)(cid:1861)(cid:1858)(cid:1861)(cid:1855)(cid:1853)(cid:1866)(cid:1856) (cid:3400) (cid:1854)(cid:1853)(cid:1871)(cid:1857) (cid:3046)(cid:3040)(cid:3028)(cid:3039)(cid:3039)(cid:3032)(cid:3046)(cid:3047) (cid:3032)(cid:3051)(cid:3043)(cid:3042)(cid:3041)(cid:3032)(cid:3041)(cid:3047) . II. F LOATING POINT MULTIPLIER DESIGN
A floating point number has four parts: sign, exponent, significand or mantissa and the exponent base. A floating point number is represented in IEEE-754 format [1, 2] as (cid:3399)(cid:1871) (cid:3400) (cid:1854) (cid:3032) or (cid:3399)(cid:1871)(cid:1861)(cid:1859)(cid:1866)(cid:1861)(cid:1858)(cid:1861)(cid:1855)(cid:1853)(cid:1866)(cid:1856) (cid:3400) (cid:1854)(cid:1853)(cid:1871)(cid:1857) (cid:3032)(cid:3051)(cid:3043)(cid:3042)(cid:3041)(cid:3032)(cid:3041)(cid:3047) . The exponent base for binary format is 2. To perform multiplication of two floating point numbers (cid:3399)(cid:1871)1 (cid:3400) (cid:1854) (cid:3032)(cid:2869) and (cid:3399)(cid:1871)2 (cid:3400) (cid:1854) (cid:3032)(cid:2870) , the significant or mantissa parts are multiplied to get the product mantissa and exponents are added to get the product exponent. i.e.; the product is (cid:3399)(cid:4666)(cid:1871)1 (cid:3400) (cid:1871)2(cid:4667) (cid:3400) (cid:1854) (cid:4666)(cid:3032)(cid:2869)(cid:2878)(cid:3032)(cid:2870)(cid:4667) . The hardware block diagram of floating point multiplier is shown in fig. 2. Double precision 1 11 52 Sign Exponent Mantissa Single precision 1 8 23 Sign Exponent Mantissa Fig. 1 Floating point formats in the proposed model
R.K.Sharma School of VLSI Design and Embedded Systems National Institute of Technology Kurukshetra Kurukshetra, India [email protected]
Arish S School of VLSI Design and Embedded Systems National Institute of Technology Kurukshetra Kurukshetra, India [email protected]
Cite as:
S. Arish and R. K. Sharma, "An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm," 2015 International Conference on Signal Processing and Communication (ICSC), Noida, 2015, pp. 303-308. doi: 10.1109/ICSPCom.2015.7150666
The important blocks in the implementafloating point multiplier [3] is described belo A. Sign Calculation
The MSB of floating point number repreThe sign of the product will be positive if are of same sign and will be negative ifopposite sign. So, to obtain the sign of the pra simple XOR gate as the sign calculator. B. Addition of Exponents
To get the product exponent, the input extogether. Since we use a bias in the floaexponent, we need to subtract the bias fexponents to get the actual exponent. The (cid:2869)(cid:2868) ( (cid:2870) ) for single precis (cid:2869)(cid:2868) ( (cid:2870) ) for double precproposed custom precision format also, a bia The computational time of mantissoperation is much more that the exponensimple ripple carry adder and ripple boroptimal for exponent addition. C. Karatsuba-Urdhva Tiryagbhyam binary m
In floating point multiplication, moscomplex part is the mantissa multiplicatiooperation requires more time compared to adnumber of bits increase, it consumes more double precision format, we need a 53x53 bisingle precision format we need 24x24 requires much time to perform these operatmajor contributor to the delay of the floatingTo make the multiplication operation more faster, the proposed model uses a combinaalgorithm and Urdhva Tiryagbhyam algorithm
Fig. 2 Floating point multiplier ation of proposed ow. esents the sign bit. both the numbers f numbers are of roduct, we can use xponents are added ating point format from the sum of e value of bias is ion format and ision format. In as of 127 is used. sa multiplication nt addition. So a rrow subtracter is multiplier st important and on. Multiplication ddition. And as the area and time. In it multiplier and in bit multiplier. It tions and it is the g point multiplier. area efficient and ation of Karatsuba m. Karatsuba algorithm uses awhere it breaks down the inputLeast Significant half and thoperands are of 8-bits wide. Kafor operands of higher bit lengtnot as efficient as it is at higheproblem, Urdhva Tiryagbhyamstages. The model of Urdhshown in Fig. 3. Urdhva Tiryagbhyam algorbinary multiplication in termsnumber of bits increases, delaproducts are added in a ripple multiplication, it requires 6 manner. And 8-bit multiplicatioCompensating the delay wilUrdhva Tiryagbhyam algorithnumber of bits is much more. Ihigher stages and Urdhva Tirstages, it can somewhat compealgorithms and hence the mulThe circuit is further optimizedsave adders instead of ripple delay to a great extent with These two algorithms are expsections.
Urdhva Tiryagbhyam algorith
Urdhva-Tiryagbhyam sutra imethod for multiplication [4, 5applicable to all cases of multshort and consists of only on‘Vertically and crosswise’. In Uthe number of steps required foand hence the speed of multipli An illustration of steps for cbit numbers is shown below a a a a and b b b b and leproduct. And the temporary par Fig. 3 Karatsuba-Urd a divide and conquer approach ts into Most Significant half and his process continues until the aratsuba algorithm is best suited th. But at lower bit lengths, it is er bit lengths. To eliminate this m algorithm is used at the lower hva-Tiryagbhyam algorithm is rithm is the best algorithm for s of area and delay. But as the ay also increases as the partial manner. For example, for 4-bit adders connected in a ripple on requires 14 adders and so on. ll cause increase in area. So hm is not that optimal if the If we use Karatsuba algorithm at ryagbhyam algorithm at lower ensate the limitations in both the ltiplier becomes more efficient. d by using carry select and carry carry adders. This reduces the minimal increase in hardware. plained in detail in the below m for multiplication is an ancient Vedic mathematics 5, 6, 7]. It is a general formula tiplication. The formula is very ne compound word and means Urdhva Tiryagbhyam algorithm, or multiplication can be reduced ication is increased. computing the product of two 4-w [8, 9]. The two input are t the p p p p p p p p be the rtial products are t , t , t , … , t . dhva multiplier model he partial products are obtained from the sThe line notation of the steps is shown in Fig Step1: t (cid:4666)1(cid:1854)(cid:1861)(cid:1872)(cid:4667) (cid:3404) a b . Step2: t (cid:4666)2(cid:1854)(cid:1861)(cid:1872)(cid:4667) (cid:3404) a b (cid:3397) a b . Step3: t (cid:4666)2(cid:1854)(cid:1861)(cid:1872)(cid:4667) (cid:3404) a b (cid:3397) a b (cid:3397) a b Step4: t (cid:4666)3(cid:1854)(cid:1861)(cid:1872)(cid:4667) (cid:3404) a b (cid:3397) a b (cid:3397) a b (cid:3397) Step5: t (cid:4666)2(cid:1854)(cid:1861)(cid:1872)(cid:4667) (cid:3404) a b (cid:3397) a b (cid:3397) a b . Step6: t (cid:4666)2(cid:1854)(cid:1861)(cid:1872)(cid:4667) (cid:3404) a b (cid:3397) a b . Step7: t (cid:4666)1(cid:1854)(cid:1861)(cid:1872)(cid:4667) (cid:3404) a b The product is obtained by adding s , s below, where s , s (cid:1853)(cid:1866)(cid:1856) s are the partial sum s (cid:3404) t t (cid:4670)0(cid:4671) t (cid:4670)0(cid:4671) t (cid:4670)0(cid:4671) t (cid:4670)0(cid:4671) t (cid:4670)0(cid:4671) t s (cid:3404) t (cid:4670)1(cid:4671) t (cid:4670)1(cid:4671) t (cid:4670)1(cid:4671) t (cid:4670)1(cid:4671) t (cid:4670)1(cid:4671) s (cid:3404) t (cid:4670)2(cid:4671) Product (cid:3404) t t (cid:4670)0(cid:4671) t (cid:4670)0(cid:4671) t (cid:4670)0(cid:4671) t (cid:4670)0(cid:4671) t (cid:4670)0(cid:4671) t (cid:4670)1(cid:4671) t (cid:4670)1(cid:4671) t (cid:4670)1(cid:4671) t (cid:4670)1(cid:4671) t (cid:4670)1(cid:4671) t (cid:4670)2(cid:4671) 0 0 0 p p p p p p p This method can be further optimizenumber of hardware. A more optimized hard[9, 10] is shown in Fig. 5. This model eliminate the need for three operand 7-bit reduces hardware and delay. The adders ripple manner.
Fig. 4 Line notation of
Urdhva Tiryagbhsteps given below. g. 4. (cid:3397) a b . (cid:1853)(cid:1866)(cid:1856) s as shown m obtained. t + + p ed to reduce the dware architecture actually helps to t adder and hence are connected in The expressions for produ p (cid:3404) a b p (cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:3435)(cid:1845)(cid:1873)(cid:1865)(cid:4666)(cid:1827)(cid:1830)(cid:1830)(cid:1831)(cid:1844) 1 (cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:4666)a b (cid:3397) a b (cid:4667) p (cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:3435)(cid:1845)(cid:1873)(cid:1865)(cid:4666)(cid:1827)(cid:1830)(cid:1830)(cid:1831)(cid:1844) 2 (cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:4666)MSB(cid:4666)ADDER1(cid:4667)p (cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:3435)(cid:1845)(cid:1873)(cid:1865)(cid:4666)(cid:1827)(cid:1830)(cid:1830)(cid:1831)(cid:1844) 3 (cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:4666)MSB(cid:4666)ADDER 2(cid:4667)p (cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:3435)(cid:1845)(cid:1873)(cid:1865)(cid:4666)(cid:1827)(cid:1830)(cid:1830)(cid:1831)(cid:1844) 4(cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:4666)MSB(cid:4666)ADDERp (cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:3435)(cid:1845)(cid:1873)(cid:1865)(cid:4666)(cid:1827)(cid:1830)(cid:1830)(cid:1831)(cid:1844) 5(cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:4666)MSB(cid:4666)ADDp (cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:3435)(cid:1845)(cid:1873)(cid:1865)(cid:4666)(cid:1827)(cid:1830)(cid:1830)(cid:1831)(cid:1844) 6(cid:3404) (cid:1838)(cid:1845)(cid:1828) (cid:1867)(cid:1858) (cid:4666)MSB(cid:4666)p (cid:3404) (cid:1829)(cid:1853)(cid:1870)(cid:1870)(cid:1877) (cid:1867)(cid:1858) (cid:1827)(cid:1830)(cid:1830)(cid:1831)(cid:1844) hyam sutra Fig. 5
Hardware archTiryagbhya uct bits are as shown below. (cid:4667)(cid:3397)a b (cid:3397) a b (cid:3397) a b (cid:4667) (cid:4667)(cid:3397)a b (cid:3397) a b (cid:3397) a b (cid:3397) a b (cid:4667) R1(cid:4667)(cid:3397)a b (cid:3397) a b (cid:3397) a b (cid:4667) DER1(cid:4667)(cid:3397)a b (cid:3397) a b (cid:4667) (cid:4666)ADDER1(cid:4667)(cid:3397)a b (cid:4667) hitecture for 4x4 Urdhva am multiplier. Since there are more than two operands in can use carry save addition to implement adtechnique reduces the delay to a great extenripple carry adder.
Karatsuba Algorithm for multiplication
Karatsuba multiplication algorithm [11, 12]multiplying very large numbers. This methoAnatoli Karatsuba in 1962. It is a divide andin which we divide the numbers into theirhalf and Least Significant half and then performed. Karatsuba algorithm reduces the numberequired by replacing multiplication operaoperations. Additions operations are faster thand hence the speed of multiplier is increaseof bits of inputs increase, Karatsuba algorithefficient. This algorithm is optimal if widththan 16 bits. The hardware architecture of Kais shown in fig. 6. Karatsuba algorithm for twcan be explained as follow. Product (cid:3404) (cid:1850). (cid:1851)
X and Y can be written as, (cid:1850) (cid:3404) 2 (cid:3041)/(cid:2870) . X l (cid:3397) X r (cid:1851) (cid:3404) 2 (cid:3041)/(cid:2870) . Y l (cid:3397) Y r Where X l, Y l and X r , Y r are the Most SigLeast Significant half of X and Y respectivnumber of bits. Then, (cid:1850). (cid:1851) (cid:3404) (cid:4672)2 (cid:3289)(cid:3118) . X l (cid:3397) X r (cid:4673) . (cid:4666)2 (cid:3289)(cid:3118) . Y l (cid:3397) Y r (cid:3404) 2 (cid:3041) . X l Y l (cid:3397) 2 (cid:3041)/(cid:2870) (cid:4666) X l Y r (cid:3397) X r Y l (cid:4667) (cid:3397) The Second term in equation (3) can be optimthe number of multiplication operations. i.e.; X l Y r (cid:3397) X r Y l (cid:3404) (cid:4666) X l (cid:3397) X r (cid:4667)(cid:4666) Y l (cid:3397) Y r (cid:4667) (cid:3398) Fig. 6 Karatsuba multiplier adders 2 to 5, we dders 2 to 5. This d compared to the is best suited for d is discovered by d conquer method, r Most Significant multiplication is er of multipliers ations by addition han multiplications ed. As the number hm becomes more h of inputs is more aratsuba algorithm wo inputs X and Y (1) (2) gnificant half and vely, and n is the (cid:4667) (cid:3397) X r Y r (3) mized to reduce (cid:3398) X l Y l (cid:3398) X r Y r The equation (3) can be re-writ (cid:1850). (cid:1851) (cid:3404) 2 (cid:3041) . X l Y l (cid:3397) X r Y r (cid:3397) (cid:3398) X The recurrence of Karatsuba al (cid:1846)(cid:4666)(cid:1866)(cid:4667) (cid:3404) 3(cid:1846) (cid:4672)(cid:1866)2(cid:4673) (cid:3397) D. Normalization of the result
Floating point representatiomantissa, which always has a vin the memory to save one bitconsidered to be the hidden bitleft of decimal point. Usuashifting, so that the MSB of mradix 2, nonzero means 1. Themultiplication result is shifted immediate left of decimal poperation of the result, the expone. This is called normalizatioof hidden bit is always 1, it is c E. Representation of exception
Some of the numbers canormalized significand. To repcode is assigned to it. In the output signals namely Zero, InfDenormal to represent these (cid:1857)(cid:1876)(cid:1868)(cid:1867)(cid:1866)(cid:1857)(cid:1866)(cid:1872) (cid:3397) (cid:1854)(cid:1861)(cid:1853)(cid:1871) (cid:3404) 0 and (cid:1871)(cid:1861)(cid:1859) is taken as Zero (±0). If the pro and (cid:1871)(cid:1861)(cid:1859)(cid:1866)(cid:1861)(cid:1858)(cid:1861)(cid:1855)(cid:1853)(cid:1866)(cid:1856) (cid:3404) 0 , the( ∞ ). If the (cid:1857)(cid:1876)(cid:1868)(cid:1867)(cid:1866)(cid:1857)(cid:1866)(cid:1872) (cid:3397) (cid:1854)(cid:1861)(cid:1853)(cid:1871) (cid:3404) 255 and (cid:1871) result is taken as NaN. Denormnumbers without a hidden 1 exponent. Denormals are usnumbers that cannot be represethe product has (cid:1857)(cid:1876)(cid:1868)(cid:1867)(cid:1866)(cid:1857)(cid:1866)(cid:1872) (cid:3397) (cid:1854) then the result is representedrepresented as (cid:3399)0. s (cid:3400) 2 (cid:2879)(cid:2869)(cid:2870)(cid:2874) , w III. I MPLIMENTA
The main objective of this pa floating point multiplier woperation both in terms of dmultiplication is the most commultiplier, we designed a multspeed and increase in delay anincrease in number of bits. IEEE-754 standard format is imand tested. The multiplier ureplacing simple adders with eadders and carry save adders. simulated using Xilinx SynthesSaprtan-3E and Virtex-4 fpgaVirtex-4 fpga is given in table (4) tten as, (cid:3397) 2 (cid:3289)(cid:3118) (cid:4666)(cid:4666) X l (cid:3397) X r (cid:4667)(cid:4666) Y l (cid:3397) Y r (cid:4667) l Y l (cid:3398) X r Y r (cid:4667) (5) lgorithm is, (cid:3397) (cid:1841)(cid:4666)(cid:1866)(cid:4667) (cid:1841)(cid:4666)(cid:1866) (cid:2869).(cid:2873)(cid:2876)(cid:2873) (cid:4667) ons have a hidden bit in the value 1 and hence it is not stored . A leading 1 in the mantissa is t, i.e. the 1 just immediate to the lly normalization is done by mantissa becomes nonzero and in e decimal point in the mantissa left if the leading 1 is not at the point. And for each left shift ponent value is incremented by on of the result. Since the value called ‘hidden 1’. ns annot be represented with a present those numbers a special proposed model, we use four finity, NaN (Not-a-number) and exceptions. If the product has (cid:1859)(cid:1866)(cid:1861)(cid:1858)(cid:1861)(cid:1855)(cid:1853)(cid:1866)(cid:1856) (cid:3404) 0 , then the result oduct has (cid:1857)(cid:1876)(cid:1868)(cid:1867)(cid:1866)(cid:1857)(cid:1866)(cid:1872) (cid:3397) (cid:1854)(cid:1861)(cid:1853)(cid:1871) (cid:3404) en the result is taken as Infinity product has (cid:1871)(cid:1861)(cid:1859)(cid:1866)(cid:1861)(cid:1858)(cid:1861)(cid:1855)(cid:1853)(cid:1866)(cid:1856) (cid:3405) 0 , then the malized values or Denormals are and with the smallest possible ed to represent certain small ented as normalized numbers. If (cid:1854)(cid:1861)(cid:1853)(cid:1871) (cid:3404) 0 and (cid:1871)(cid:1861)(cid:1859)(cid:1866)(cid:1861)(cid:1858)(cid:1861)(cid:1855)(cid:1853)(cid:1866)(cid:1856) (cid:3405) 0 , d as Denormal. Denaormal is where s is the significand. ATION AND RESULTS aper is to design and implement which must be efficient in its delay and area. Since mantissa mplex part in the floating point tiplier which can operate at high nd area is significantly less with Floating point multiplier with mplemented using Verilog HDL units are further optimized by efficient adders like carry select The model is synthesized and sis Tools (ISE 14.7) targeted on a. The summary of results on I and table II. Comparison with arious multiplier units is given in tables III, IV, V, VI and VII. IV. C ONCLUSION AND FUTURE WORK
This paper shows how to effectively reduce the percentage increase in delay and area of a floating point multiplier by using a very efficient combination of Karatsuba and Urdhva-Tiryagbhyam algorithms. The model can be further optimized in terms of delay by using pipelining methods and precision of the result can be increased by adding efficient truncation and rounding methods. R EFERENCES [1] IEEE 754-2008, IEEE Standard for Floating-Point Arithmetic, 2008. [2] Computer Arithmetic, Behrooz Parhami, Oxford University Press, 2000. [3] B. Jeevan , S. Narender , C.V. Krishna Reddy, K. Sivani, “A High SpeedBinary Floating Point Multiplier Using Dadda Algorithm”, International Multi-Conference on Automation, Computing, Communication, Control and Compressed Sensing, pp. 455-460, 2013 [4] “Vedic mathematics”, Swami Sri Bharati Krsna Thirthaji Maharaja, Motilal Banarasidass Indological publishers and Book sellers, 1965 [5] R. Sridevi, Anirudh Palakurthi, Akhila Sadhula, Hafsa Mahreen, “Design of a High Speed Multiplier (Ancient Vedic Mathematics Approach)”, International Journal of Engineering Research (ISSN : 2319-6890), Volume No.2, Issue No.3, pp : 183-186, July 2013
TABLE I Performance analysis of Karatsuba-Urdhva multipliers (cid:1858) (cid:3040)(cid:3028)(cid:3051) (MHz) 274.469 248.964 226.508 209.606 Logic levels 14 22 31 39
TABLE II Performance analysis of Floating point multipliers in the proposed model.
Slices LUTs IOBs Delay (ns) (cid:1858) (cid:3040)(cid:3028)(cid:3051) (MHz) Max. comb. path delay(ns)
Single precision
977 1073 97 16.182 226.508 9.831
Double precision
Ref. [8] Ref. [9] Ref. [13] Proposed multiplier Width
Delay
TABLE IV Delay comparison of various 16-bit multipliers with proposed Karatsuba-Urdhva multiplier
Ref. [14]-vedic multiplier Ref. [7] Proposed multiplier Width
Delay
Delay and area comparison of 24-bit multipliers with proposed Karatsuba-Urdhva multiplier
Slices LUTs Delay Ref. [15]
Proposed multiplier
972 1018 12.996ns TABLE VI
Delay and area comparison of 32-bit multipliers with proposed Karatsuba-Urdhva multiplier
LUTs Delay
Ref. [14]-
Modified Booth multiplier (Radix-8)
Modified Booth multiplier (Radix-16)
Proposed multiplier
Delay and area comparison of SP-floating point multiplier with proposed SP FP multiplier
Slices LUTs Delay Ref. [15]
Ref. [3]
Proposed multiplier
977 1073 16.182ns
6] Nivedita A. Pande, Vaishali Niranjane, Anagha V. Choudhari, “Vedic Mathematics for Fast Multiplication in DSP”, International Journal of Engineering and Innovative Technology (IJEIT), Volume 2, Issue 8, pp. 245-247, February 2013 [7] R.K. Bathija, R.S. Meena, S. Sarkar, Rajesh Sahu, “Low Power High Speed 16x16 bit Multiplier using Vedic Mathematics”, International Journal of Computer Applications (0975 – 8887), Volume 59– No.6, pp. 41-44, December 2012 [8] Poornima M, Shivaraj Kumar Patil, Shivukumar , Shridhar K P , Sanjay H, “Implementation of Multiplier using Vedic Algorithm”, International Journal of Innovative Technology and Exploring Engineering (IJITEE), ISSN: 2278-3075, Volume-2, Issue-6, pp. 219-223, May 2013 [9] Premananda B.S., Samarth S. Pai, Shashank B., Shashank S. Bhat, “Design and Implementation of 8-Bit Vedic Multiplier”, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Vol. 2, Issue 12, pp. 5877-5882, December 2013 [10] Harpreet Singh Dhillon, Abhijit Mitra, “A Reduced-Bit Multiplication Algorithm for Digital Arithmetic”, World Academy of Science, Engineering and Technology, Vol 19, pp. 719-724, 2008 [11] N.Anane, H.Bessalah, M.Issad, K.Messaoudi, “Hardware implementation of Variable Precision Multiplication on FPGA”, 4th International Conference on Design & Technology of Integrated Systems in Nanoscale Era, pp. 77-81, 2009 [12] Anand Mehta, C. B. Bidhul, Sajeevan Joseph, Jayakrishnan. P, “Implementation of Single Precision Floating Point Multiplier using Karatsuba Algorithm”, 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE), pp. 254-256, 2013 [13] R. Sai Siva Teja, A. Madhusudhan, “FPGA Implementation of Low-Area Floating Point Multiplier Using Vedic Mathematics”, International Journal of Emerging Technology and Advanced Engineering, ISSN 2250-2459, Volume 3, Issue 12, pp. 362-366, December 2013. [14] Jagadeshwar Rao M, Sanjay Dubey, “A High Speed and Area Efficient Booth Recoded Wallace Tree Multiplier for fast Arithmetic Circuits”, 2012 Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics (PRIMEASIA), pp. 220-223, 2012. [15] Anna Jain, Baisakhy Dash, Ajit Kumar Panda, Muchharla Suresh, “FPGA Design of a Fast 32-bit Floating Point Multiplier Unit”, International Conference on Devices, Circuits and Systems (ICDCS), pp. 545-547, 20126] Nivedita A. Pande, Vaishali Niranjane, Anagha V. Choudhari, “Vedic Mathematics for Fast Multiplication in DSP”, International Journal of Engineering and Innovative Technology (IJEIT), Volume 2, Issue 8, pp. 245-247, February 2013 [7] R.K. Bathija, R.S. Meena, S. Sarkar, Rajesh Sahu, “Low Power High Speed 16x16 bit Multiplier using Vedic Mathematics”, International Journal of Computer Applications (0975 – 8887), Volume 59– No.6, pp. 41-44, December 2012 [8] Poornima M, Shivaraj Kumar Patil, Shivukumar , Shridhar K P , Sanjay H, “Implementation of Multiplier using Vedic Algorithm”, International Journal of Innovative Technology and Exploring Engineering (IJITEE), ISSN: 2278-3075, Volume-2, Issue-6, pp. 219-223, May 2013 [9] Premananda B.S., Samarth S. Pai, Shashank B., Shashank S. Bhat, “Design and Implementation of 8-Bit Vedic Multiplier”, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Vol. 2, Issue 12, pp. 5877-5882, December 2013 [10] Harpreet Singh Dhillon, Abhijit Mitra, “A Reduced-Bit Multiplication Algorithm for Digital Arithmetic”, World Academy of Science, Engineering and Technology, Vol 19, pp. 719-724, 2008 [11] N.Anane, H.Bessalah, M.Issad, K.Messaoudi, “Hardware implementation of Variable Precision Multiplication on FPGA”, 4th International Conference on Design & Technology of Integrated Systems in Nanoscale Era, pp. 77-81, 2009 [12] Anand Mehta, C. B. Bidhul, Sajeevan Joseph, Jayakrishnan. P, “Implementation of Single Precision Floating Point Multiplier using Karatsuba Algorithm”, 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE), pp. 254-256, 2013 [13] R. Sai Siva Teja, A. Madhusudhan, “FPGA Implementation of Low-Area Floating Point Multiplier Using Vedic Mathematics”, International Journal of Emerging Technology and Advanced Engineering, ISSN 2250-2459, Volume 3, Issue 12, pp. 362-366, December 2013. [14] Jagadeshwar Rao M, Sanjay Dubey, “A High Speed and Area Efficient Booth Recoded Wallace Tree Multiplier for fast Arithmetic Circuits”, 2012 Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics (PRIMEASIA), pp. 220-223, 2012. [15] Anna Jain, Baisakhy Dash, Ajit Kumar Panda, Muchharla Suresh, “FPGA Design of a Fast 32-bit Floating Point Multiplier Unit”, International Conference on Devices, Circuits and Systems (ICDCS), pp. 545-547, 2012