[PDF] An Integer Approximation Method for Discrete Sinusoidal Transforms

Abstract

Approximate methods have been considered as a means to the evaluation of discrete transforms. In this work, we propose and analyze a class of integer transforms for the discrete Fourier, Hartley, and cosine transforms (DFT, DHT, and DCT), based on simple dyadic rational approximation methods. The introduced method is general, applicable to several block-lengths, whereas existing approaches are usually dedicated to specific transform sizes. The suggested approximate transforms enjoy low multiplicative complexity and the orthogonality property is achievable via matrix polar decomposition. We show that the obtained transforms are competitive with archived methods in literature. New 8-point square wave approximate transforms for the DFT, DHT, and DCT are also introduced as particular cases of the introduced methodology.

Full PDF

aa r X i v : . [ ee ss . SP ] J u l An Integer Approximation Method for Discrete Sinusoidal Transforms

R. J. Cintra * Abstract

Approximate methods have been considered as a means to the evaluation of discrete transforms. In this work, we proposeand analyze a class of integer transforms for the discrete Fourier, Hartley, and cosine transforms (DFT, DHT, and DCT), basedon simple dyadic rational approximation methods. The introduced method is general, applicable to several block-lengths,whereas existing approaches are usually dedicated to speciﬁc transform sizes. The suggested approximate transforms enjoylow multiplicative complexity and the orthogonality property is achievable via matrix polar decomposition. We show that theobtained transforms are competitive with archived methods in literature. New 8-point square wave approximate transformsfor the DFT, DHT, and DCT are also introduced as particular cases of the introduced methodology.

Keywords

Approximate transforms discrete sinusoidal transforms, low-complexity transforms, nonorthogonal transforms, orthogonalization

Discrete transforms play a signiﬁcant role in digital signalprocessing. Among the possible discrete transforms, thosebased on sinusoidal transformation kernels occupy a promi-nent position. Examples of discrete sinusoidal transformsinclude the discrete Fourier transform (DFT), the discreteHartley transform (DHT), and the discrete cosine transform(DCT) [1, 2].Mathematically, discrete sinusoidal transforms relate two n -dimensional vectors v and v ′ possibly deﬁned over thecomplex numbers ﬁeld according to the formalism below: v ′ k = n − X i = v i · ker( i , k , n ) , k = n −

1, (1) v i = n − X k = v ′ k · ker − ( i , k , n ) , i = n −

1, (2)where ker( · , · , · ) and ker − ( · , · , · ) are known as the forward andinverse transformation kernels, respectively. Table 1 listssome possible kernel functions; the Kronecker delta was em-ployed.Although a variety of factors contribute to the computa-tional complexity of a numerical method [1], the number ofrequired arithmetical operations is frequently utilized as ameasure of complexity. When computed directly accordingto Equations (1) and (2), discrete sinusoidal transforms re-quire a number of multiplications and additions in O ( n ).Thus, direct computation may not be practical.Additionally, kernels can be complex-valued, which re-quire further arithmetical considerations. The DFT ker-nel is a notable example. Moreover, the necessary values * R. J. Cintra is with the Signal Processing Group, Departamento de Es-tatística, Universidade Federal de Pernambuco. E-mail: [email protected]

Table 1: Some sinusoidal unitary kernelsTransform kernel ker( i , k , n )Fourier p n exp( − π jik / n )Cosine ¡ − (1 − p δ k ¢ p n cos( π ( i + kn )Hartley p n cas(2 π ik / n )are often irrational numbers. Therefore, computations de-scribed in Equations (1) and (2) not only require a signiﬁ-cant amount of operations, but they are expected to handleﬂoating-point representation over a possibly complex ﬁeld.Fast algorithms constitute a collection of methods aim-ing at dramatically reducing arithmetical complexity ﬁg-ures. For discrete transforms, classical methods usually at-tain such minimization by means of (i) divide-and-conquerstrategies [3]; (ii) matrix factorization schemes [4]; and (iii)convolution methods [5].Efforts has been directed to the reduction of the mul-tiplicative complexity. This is explained in part due tothe well-developed multiplicative complexity theory, cham-pioned by Winograd [6] and Heideman [7]. Such develop-ments allowed the prediction of theoretical lower boundsfor the number of multiplications required by some discretetransforms [7, 8]. Much of the research in this ﬁeld is con-cerned the design of algorithms that can be considered “op-timal” in a multiplicative complexity measurement.However, bounded by theoretical constraints, exact meth-ods could only achieve the prognosticated complexity min-ima at best. Even in such an optimal scenario, ﬂoating-pointoperations are involved. In particular, ﬂoating-point multi-plications are known to possess relatively slow implementa-tions, even in hardware [9].One way to circumvent a possibly signiﬁcant multiplica-1ive complexity is to consider, not exact, but approximatecomputations. In this case, the theoretical limits on the mul-tiplicative complexity do not apply. A trade-off between com-plexity and accuracy may take place.Several approximation methods have been proposedin literature. Arithmetic transform procedures explorenonuniform sampling and number-theoretic functions to de-vise multiplication-free algorithms for the DFT [10], theDHT [11], and the DCT [12]. Approximations for the DHTbased on Ramanujan numbers [13] and on wavelets [14]were also suggested. The DCT computation was shown tobe approximated in many ways. In particular, integer ap-proximations are a signiﬁcant category of methods [15–21].In [22], the DFT was submitted to an integer approximationstudy as well.Integer approximation procedures constitute a class ofpractical interest. Essentially, these methods take advan-tage of the fast computation of the integer arithmetic, whencompared to ﬂoating-point manipulations. With the use ofdyadic rational approximations [2] and the canonical signeddigit representation [23], integer multiplications can be el-egantly converted into combinations of additions and bitshifting operations. As a consequence, multiplication countsare virtually zeroed and, in its place, the number of addi-tions and shifts are often quantiﬁed.Another aspect of this discussion concerns usual require-ments of orthogonality and perfect reconstruction. Muchemphasis has been put in these properties, which frequentlyimpose challenging design constraints for integer approxi-mation algorithms [2]. In particular, orthogonal integer ap-proximations for large blocklengths can be difﬁcult to be ob-tained. Several existing design procedures require the solu-tion of large constrained non-linear optimization problemsin integer domain. Even for small blocklengths and consid-ering exhaustive search, solutions are not trivial [2, 22].On the other hand, nonorthogonal methods are becomingincreasingly popular [24]. Classes of nonorthogonal trans-forms have been deﬁned [25] and algorithms for designingnonorthogonal basis have been considered [26]. Nonorthogo-nal, but closely orthogonal, matrices have found applicationsin soft clustering analysis [27]. Recently, blind source sep-aration procedures were given a comprehensive treatmentwith nonorthogonal matrices [28]. Nonorthogonal basis im-ages were also explored as a means to provide better repre-sentation methods for compressed images [29].In this context, the goal of the present work is the pro-posal of an integer approximation method for discrete sinu-soidal transforms based on dyadic rational approximations.In this study, we initially relax, but not neglect, orthogo-nality and perfect reconstruction constraints. Subsequently,we submit the proposed approximate transforms to a conve-nient orthogonalization method based on the matrix polardecomposition [30]. Related fast algorithms are suggested.We also aim at introducing new square wave transforms ina comparable fashion as studied in [2, 31–33].The paper is organized as follows. In Section 2, a dyadicrational approximation of the cosine function is examined.Afterwards, integer approximations for the DFT, DHT, and DCT matrices are proposed; and an optimized global scalingfactor is considered. In Section 3, the inverse transforma-tion and orthogonality issues are discussed; an error analy-sis is also derived. Section 4 suggests some potential appli-cations for the proposed approximations. Finally, Section 5concludes the paper. The nearest integer function offers a possible venue to mapthe values of the transformation kernel into dyadic rationalnumbers. This function simply returns integer values ac-cording to the following construction:[ x ] , sgn( x ) ¹ | x | + º , (3)where ⌊·⌋ is the ﬂoor function, sgn( · ) is the sign function, and | · | returns the absolute value. This deﬁnition is in agree-ment to the implementation of the rounding algorithm avail-able in the standard mathematical library of C language. Acomplex number z = x + j y , where x , y ∈ R , is rounded off ac-cording to [ z ] , [ x ] + j [ y ]. More generally, let the m th orderdyadic rational approximating function be deﬁned as[ x ] m , [2 m x ]2 m , (4)where x is a real number and m is a nonnegative integer.When m =

0, the approximating function [ · ] is equal to thenearest integer function. Intuitively, as m → ∞ , we havethat [ · ] ∞ becomes the identity function.Thus, the m th order dyadic rational approximation of agiven kernel function can be obtained according toker m ( i , k , n ) , [ker( i , k , n )] m . (5)As a consequence, transform vector v ′ can be approximatedas ˆ v ′ , whose components are expressed as follows:ˆ v ′ k = n − X i = v i · ker m ( i , k , n ) , k = n −

1. (6)For m =

0, the approximation is derived via the usualrounding off operation. In this case, the values of ker ( · , · , · )are 0 or ±

1. Although this mapping can be regarded as acoarse approximation for ker( · , · , · ), it has zero multiplicativecomplexity, and only addition operations are necessary torender the approximate transformed signal. Another par-ticularly interesting case occurs when m =

1. In this sit-uation, ker ( · , · , · ) returns only 0, ±

1, or ± − . Thus, onlyadditions and simple bitwise shift operations are employedto obtain the approximate transform. For higher values of m , canonical signed digit representation could be applied tofurnish multiplierless computations for the quantities dis-played in Equation (6). Clearly, as m → ∞ , ker m ( · , · , · ) → m α ∞ · , · , · ), where the limiting process indicates pointwise con-vergence [34].Obviously, different functions can possess the same near-est integer approximation. Thus we may examine how goodthe approximations given by the function [ · ] m are. Due toits ubiquity in discrete transform theory, let us consider thecosine function to be approximated. A multiplicative scalingfactor α is also included for an additional degree of freedom.Adopting the mean square error as the objective function tobe minimized, we can set up the following unconstrained op-timization problem:min α Z π − π ( α cos( x ) − [ α cos( x )] m ) d x , (7)for a ﬁxed m .Routine manipulations, which employ standard optimiza-tion methods, show that the optimal scaling factor is a rootof the following non-linear equation: π α = m m − X k = p m + α − (2 k + . (8)For m =

0, we have that the optimal scaling factor is exactly p π p + p − π ≈ m , an-alytical computations are beyond purpose, Table 2 lists opti-mal values obtained by numerical methods. When m → ∞ ,the optimal scaling factor collapses to 1, as expected. As abyproduct, in this case, Equation (8) furnishes an inﬁnitesummation formula for the value of π when α = m ( · , · , n ) = [ α · ker( · , · , n )] m , for an optimal value α . Thus,for operational purposes, we could admit α =

1. Figure 1(a)displays the plots of [cos( t )] m , for m = t ), over the interval [ − π , π ].In this case, the mean square error due to approximat-ing the cosine function as [cos( t )] m , − π < t ≤ π , has a closedformula given by:MSE([cos( t )] m ,cos( t )) = π − m m − X k = p m + − (2 k + + m m − X k = (2 k + − µ k + m + ¶ .(9)Figure 1(b) depicts the above mean square error calculatedfor 0 ≤ m ≤

6. Henceforth, we adopt matrix notation. Thus, let theFourier, Hartley, and cosine transformation matrices be de-ﬁned, in their unitary form, as F n = p n ½ exp µ − π i jkn ¶¾ n − i , k ) = , (10) H n = p n ½ cas µ π ikn ¶¾ n − i , k ) = , (11) C n = s n ½³ − (1 − p δ k ´ cos µ π ( k + in ¶¾ n − i , k ) = , (12)respectively.In account of the previous discussion, we deﬁne the m thorder dyadic rational matrices associated to F n , H n , and C n according to F ( m ) n , £ p n F n ¤ m , (13) H ( m ) n , ·r n H n ¸ m , (14) C ( m ) n , ·r n C n ¸ m , (15)where the operator [ · ] m when applied to matrices acts com-ponentwisely. Prior to the application of the dyadic rationalapproximation procedure, the elements of original transformmatrices were subject to a normalization by p n or p n /2.Moreover, when scaled by 2 m , the resulting matrices are con-stituted of integer numbers only. For instance, when m = −

1, 0, or + Indeed, the proposed dyadic rational matrices can furnishapproximations for F n , H n , and C n . A possible way to obtainsuch approximations is by the inclusion of a scaling factor,as suggested in [2, p. 275]. Scaling factors are introduced insuch a way to minimize a chosen error measure between anoriginal transformation matrix and its approximation.In matrix terms, the dyadic rational matrices link a vector v to its approximate transform ˆ v ′ , according to the followingexpression ˆ v ′ = β · K ( m ) n v , (16)where K ( m ) n ∈ { F ( m ) n , H ( m ) n , C ( m ) n } , for a ﬁxed m , and β is areal number. The quantity β provides a global adjustmentin such a way that β K ( m ) n satisfactorily approximates K n ∈ { F n , H n , C n } , respectively.Thus, we have set the following unconstrained optimiza-tion problem: min β Error ³ K n , β K ( m ) n ´ , (17)where Error( · , · ) quantiﬁes the dissimilarity between its ar-guments, according to a selected measure. Adopting theFrobenius norm of the matrix difference K n − β K ( m ) n as theobjective function to be minimized [35, p. 523], we obtain3 t c o s ( t ) , [ c o s t ( t )] , [ c o s ( t )] (a) −5 −4 −3 −2 −1 m M S E (b) Figure 1: (a) Cosine function (dash-dotted line) and two approximations: [cos( t )] (solid line) and [cos( t )] (dashed line), overthe interval − π < t ≤ π . (b) Mean square error of [cos( t )] m with respect to cos( t ) in the interval − π < t ≤ π for several valuesof m .optimal values for the scaling factor β using conventionaloptimization procedures. Tables 3, 4, and 5 show the nu-merically computed optimal values of β for selected practicalblocklengths at various approximation orders.Other error measures, such as the spectral norm or the p -norm for p

2, could be considered instead of the Frobe-nius norm. In that case, different, but comparable, valuesof β would be found. Since β is an overall scaling factor, itdoes not affect further mathematical considerations on thenature of the approximate matrices K ( m ) n .For illustrative purposes, consider a pure sinusoidal sig-nal v i = sin ¡ π i ¢ , i = v , compared to the approximations given by the discussedmethod for m = m increases, approximationsbecome more accurate. Approximations for the DFT and theDCT showed similar behavior. Although in several applications, such as pattern classiﬁca-tion based on transform domain feature extraction [36] andtransform adaptive ﬁltering [37, p. 154], only the forwardtransform is required, we investigate the inverse transfor-mation of the proposed approximate method.Function [ · ] m imposes inherent analytical difﬁculties.Consequently, it may be not obvious to establish whether,for any given value of n and m , the inverse matrix ³ K ( m ) n ´ − does exist. Then, we resort to exhaustive computationalsearch. For n ≤ m ≤

6, the inverse of K ( m ) n wasalways found to exist. The evaluation of the condition num-ber is adopted as a means to assess how well-conditioned these matrices are in terms of matrix inversion [38]. In-deed, the condition number measures the sensitivity of ma-trix inversion [39]. Thus, for the considered search space,the 2-norm condition number of K ( m ) n is small, never exceed-ing 2.5295, 2.5295, and 2.9432, for the Fourier, Hartley, andcosine approximate matrices, respectively. Such low val-ues of the condition number indicate well-conditioned ma-trices. Figure 3 depicts the values of the condition numberof F ( m ) n as a function of the transform size n . Approximatematrices associated to the Fourier and Hartley transformshad identical condition number and matrices C ( m ) n presentedsimilar values. For comparison, notice that since the exactFourier, Hartley, and cosine matrices are unitary, their con-dition numbers are equal to one for any blocklength.Thus, for practical purposes, assuming that the inverse of K ( m ) n is well-deﬁned, the following manipulation holds true:( K ( m ) n ) H ³ K ( m ) n ( K ( m ) n ) H ´ − K ( m ) n = I n , (18)where the superscript H indicates the Hermitian transposi-tion and I n is the identity matrix of size n . Consequently, weconclude that ³ K ( m ) n ´ − = ( K ( m ) n ) H ³ K ( m ) n ( K ( m ) n ) H ´ − . (19)Strictly, the proposed approximate matrices lack unitaryproperty, since ³ K ( m ) n ´ − ( K ( m ) n ) H . An extra multiplicativeterm D − , ³ K ( m ) n ( K ( m ) n ) H ´ − is necessary to furnish the ma-trix inversion, which enables perfect signal reconstruction.We have then obtained the following set of relations:ˆ v ′ = β · K ( m ) n v , (20) v = β · ( K ( m ) n ) H D − ˆ v ′ . (21)4able 3: Optimal β for the approximate DFT mn ∞ p

68 0.3121 0.3745 0.3480 0.3480 0.3560 1/ p

812 0.2478 0.2732 0.3008 0.2877 0.2877 1/ p p p p β for the approximate DHT mn ∞ p

26 0.4509 0.6095 0.5511 0.5511 0.5944 1/ p

38 0.3745 0.6243 0.4780 0.4779 0.5105 1/212 0.3189 0.4310 0.3897 0.3897 0.4203 1/ p

616 0.2800 0.3839 0.3328 0.3465 0.3575 1/ p

824 0.2300 0.2950 0.2811 0.2784 0.2947 1/ p p β for the approximate DCT mn ∞ p

26 0.4462 0.5955 0.5682 0.5572 0.5873 1/ p

38 0.3922 0.4891 0.4831 0.4925 0.5014 1/212 0.3303 0.3920 0.3929 0.4065 0.4090 1/ p

616 0.2876 0.3320 0.3380 0.3497 0.3548 1/ p

824 0.2342 0.2709 0.2816 0.2853 0.2889 1/ p p

32 64 96 128−505 k E x ac t DH T (a) m = ∞ k O t h o r d e r a pp r ox . (b) m = k s t o r d e r a pp r ox . (c) m = k o r d e r a pp r ox . (d) m = k r d o r d e r a pp r ox . (e) m = k t h o r d e r a pp r ox . (f) m = Figure 2: Discrete Hartley transform of a pure sinusoidal signal: (a) exact computation, (b) 0th order approximation, (c)1st order approximation, (d) 2nd order approximation, (e) 3rd order approximation, and (f) 4th order approximation. n C ond iti on N u m b e r Figure 3: Condition number of approximate Fourier matrices for m = ◦ –), m = ··· ◦ ··· ), m = · −◦− · ), m = ◦ - -), m = × –), m = ···×··· ), and m = · −×− · ). 6xcept for the presence of D − , both forward and inverseapproximate transformations share the same computationalcomplexity. This is because their matrices are related by asimple transposition.However, if m → ∞ , then K ( m ) n converges to K n ∈ { F n , H n , C n } . Being K n a unitary matrix, we obtain thatlim m →∞ K ( m ) n ( K ( m ) n ) H = K n K Hn = I n . (22)Thus, K ( m ) n is asymptotically unitary in terms of m . There-fore, for sufﬁciently large m , Equations (20) and (21) can berecast as: ˆ v ′ = β · K ( m ) n v , (23)ˆ v ≈ v = β · ( K ( m ) n ) H ˆ v ′ , (24)where ˆ v is an approximation due to the nonorthogonality of K ( m ) n . In view of the above discussion, we can re-examine theoptimization procedure performed in the previous section.There, the approximate matrices were corrected by an over-all constant factor termed β . In contrast to that, we can con-sider a more reﬁned correction term. Notice that the term K ( m ) n ( K ( m ) n ) H is a Gram matrix, which is Hermitian, non-negative deﬁnite. Then, D − is also Hermitian [35, p. 82]and its matrix square root is well-deﬁned, Hermitian [35,p. 82], and unique [40, p. 89]. Consequently, Equation (18)becomes: ( K ( m ) n ) H ¡ D − ¢ ¡ D − ¢ K ( m ) n = I n . (25)Taking into account that the inverse of a Hermitian matrixis also Hermitian [35, p. 82], we have that µ¡ D − ¢ K ( m ) n ¶ H ¡ D − ¢ K ( m ) n = I n . (26)It follows that ¡ D − ¢ K ( m ) n is a unitary matrix; and the term S − , ¡ D − ¢ adjusts the approximate matrix. In fact, thisadjustment term is exactly the inverse of the unique Her-mitian non-negative deﬁnite matrix obtained when K ( m ) n isfactorized according to the matrix polar decomposition pro-cedure [35, p. 348].Being the unique unitary matrix of a polar decomposi-tion, the term S − K ( m ) n possesses some optimal approxima-tion properties. In particular, when the Frobenius norm isutilized as a distance measure, S − K ( m ) n is the nearest uni-tary matrix to K ( m ) n [30].Thus, Equations (20) and (21) can be modiﬁed into thefollowing form: ˆ v ′ = S − K ( m ) n v , (27) v = ( K ( m ) n ) H S − ˆ v ′ , where the scaling factor β was suppressed and its role istaken by S − .Regarding the closeness of these approximations to theexact matrices, computational calculations for n ≤ m = k K n − S − K ( m ) n k F < k K n − β K ( m ) n k F , (28)where k · k F denotes the Frobenius norm. In a sense, thiscould be expected, since the adjustment offered by the opti-mal scaling factor β corresponds simply to the matrix β · I n .On the other hand, the adjustment by S − possesses a higherarithmetic complexity, which allows a better approximation.Therefore, a signiﬁcant part of the computational cost ofusing S − K ( m ) n as an approximation for K n relies on the com-plexity of S − . This observation prompts us to examine thebehavior of S − . Being K n already unitary matrices, when η K n are submittedto a polar decomposition, one obtains that η · K n = ( η · I n ) · K n , (29)where η is a normalizing factor equal to p n for the DFTand to p n /2 for the DHT or DCT. Introducing a perturbationmatrix ∆ E , we can represent the discussed dyadic approxi-mation according to K ( m ) n = η · K n + ∆ E . Therefore, we havethat polar decomposition of K ( m ) n is given by η · K n + ∆ E = η · ( I n + ∆ S ) · ( K n + ∆ E ), (30)where ∆ S and ∆ E are induced perturbation matrices in theHermitian and orthogonal polar decomposition factors, re-spectively. The Hermitian matrix is related to the adjust-ment matrix as S = η · ( I n + ∆ S ). The matrix K n + ∆ E isrecognized as an orthogonal approximation to K n [30, 35].More explicitly, we have that K n + ∆ E = S − · K ( m ) n (31) = η − · ( I n + ∆ S ) − · ( η K n + ∆ E ), (32)where ∆ E has minimum Frobenius norm [30]. Thus, if ∆ S approaches a null matrix, then η · S − is close to an identitymatrix. This is desirable, since the complexity of an identitymatrix is null.Aiming to determine an upper bound for the distance be-tween η · S − and I n , we ﬁrst analyze the distance betweenits inverse η − · S and I n . This latter distance is quantiﬁedby the following approximation error ǫ = k η − S − I n k F k I n k F (33) = k ∆ S k F k I n k F . (34)In a similar manner, it is reasonable to quantify the pertur-bation error induced by ∆ E as ǫ = k ∆ E k F k η K n k F . (35)7y construction, the elements of ∆ E are bounded by 1/2 m + .Thus, in the worst possible scenario, we have that ∆ E = m + J n , (36)where J n is a square matrix of ones with dimension n . Thistype of matrix and its properties are discussed in [41, p. 2].Therefore, we obtain that k ∆ E k F = n m + . Additionally, forall considered discrete sinusoidal transforms, we have that k K n k F = p n . Thus, we have that ǫ ≤ m + p n η = m + × ½

1, for the DFT, p

2, for the DHT or DCT ¾ .(37)In [30], Higham submitted the polar decomposition pro-cedure to a comprehensive error analysis. It was demon-strated that error measures ǫ and ǫ could be related ac-cording to ǫ ≤ p ǫ + O ( ǫ ). (38)Performing the necessary substitutions and observing that η / p n is a constant, we obtain: ǫ ≤ m × ½ p

2, for the DFT,1, for the DHT or DCT, ¾ + O µ m ¶ . (39)Therefore, ǫ = O ¡ m ¢ .Now we return to the error analysis of η S − . Analogously,the sought approximation error is given by ǫ ′ = k η S − − I n k F k I n k F . (40)Invoking results on the stability of matrix inversion [38,42],we conclude that ǫ ′ has the same asymptotic behavior as ǫ .Additionally, the asymptotic behaviors of ǫ and ǫ ′ are inde-pendent of the blocklength and are related only to the ap-proximation order m . For a ﬁxed m , we have that ǫ , ǫ ′ = O (1).On the other hand, ﬁxing the blocklength size, we also ob-tain that ǫ , ǫ ′ = O (1/2 m ). -point approximate DFT Usual 8-point DFT is a fundamental building block of sev-eral signal processing methods [3, 43]. The 1st order 8-pointapproximate DFT possesses the following matrix transfor-mation: F (1)8 =  − j − j − − j − − + j j + j − j − j − j − j − − j j − j − + j − j − + j − − − − − + j − j + j − − j j − − j j − − j j − − j + j j − + j − − − j − j − j  . (41)Applying usual methods for matrix factorization [3], onecan derive the following construction: F (1)8 = P · A · A · T · A · A , (42) where each factor is deﬁned according to A =  − − − −  , A =  − − −  , (43) A =  − −  , A =  − − − −  , (44) P =   , T =  − j − j /2 − j  , (45)where “ − ” represents − F (1)8 furnishes the following relation:( F (1)8 ) − = ( F (1)8 ) H D − , (46)where16 · D − =  − − − −  = · I +  − −− −  (47)Notice the extremely simple expression for D − as well as itslow computational complexity, which requires only additionsand bit shifts.Surprisingly, the inverse matrix F (1)8 is related to the ze-roth order approximate matrix F (0)8 :( F (1)8 ) − =

18 ( F (0)8 ) H . (48) Per se , this relation constitutes the basis for a new complex-valued square wave transform, equipped with meaningfultransform domain (cf. the SDCT by Haweel [31]). It is im-portant to recognize that this relation guarantees the perfectreconstruction property.Additionally, whenever a scaled version of the spectralcomponents is admissible, the quantity β (Equation (21))can be set to one. Possible scenarios are Fourier descrip-tors evaluation [44], transform domain threshold-based de-tectors [45], and signal classiﬁcation based on transform co-efﬁcient features [46].8 v v v v v v v v v v v v v v v v V V V V V V V V V V V V V V V V Figure 4: Flow graph for the fast algorithm of 0th order 16-point approximate DHT. Dashed boxes denote shorter trans-forms embedded in 16-point algorithm. - and -point approximate DHT Besides being an elementary short blocklength transformemployed as a means to evaluate larger size transforms [47],the 8-point DHT has found applications as an operator foredge detection [48]. The 1st order 8-point approximate DHTalso implies another simple square wave transform, accord-ing to: ( H (1)8 ) − =

14 ( H (0)8 ). (49)This fortunate relation allows the deﬁnition of a real-valuedtransformation, whose spectrum approximates the 8-pointDHT.Now let us examine a fast algorithm for the 0th order 16-point approximate DHT. The obtained transformation ma-trix H (0)16 is given by H (0)16 =  − − − − − − −

11 1 1 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − −  . (50)Using classical methods described in [3], the implementa-tion diagram of a fast algorithm for H (0)16 was found and isdisplayed in Figure 4. The resulting algorithm turned outto possess embedding properties. In the 16-point algorithm,the presence of fast algorithms for the 0th order 2-, 4-, and8-point approximate DHT is noticeable. In Figure 4, suchsmaller transformations are separated in dashed boxes. Table 6: Approximate DCT m β MSE C g η − − − − − − − − ∞ S − m MSE C g η − − − − − − − − ∞ – 8.8259 93.9912 -point approximate DCT The 8-point DCT has attracted considerable research effort.This particular blocklength is widely adopted in several im-age and video coding standards, such as JPEG, MPEG-1,MPEG-2, H.261, and H.263 [49]. The 8-point DCT is alsosubject to an extensive analysis in [2]. Using Equations (20)and (27), we propose a new approximation for the 8-pointDCT.In order to evaluate the suggested approximations, weconsider an input vector modelled after a ﬁrst-order station-ary Markov process with zero mean and unity variance. Ad-ditionally, it is assumed that adjacent vector componentspossess a correlation coefﬁcient of 0.95 [2]. Consideringthese assumptions, commonly employed evaluation criteria,such as (i) mean square error; (ii) transform coding gain( C g ); and (iii) transform efﬁciency ( η ), can be computed de-terministically [9].Tables 6 and 7 list evaluation data for the proposed ap-proximate 8-point DCT, when Equation (20) and (27) areconsidered, respectively. High values for the coding gainin Table 6 are due to the nonorthogonality of the consid-ered transformations. This phenomenon was already re-ported in [50]. In [51, p. 18], Goyal gives a comprehensiveaccount on how nonorthogonal transforms could outperformthe Kahunen-Loève transform, for instance.Existing approximation methods for the DCT include (i)the C -matrix transform (CMT) [52]; (ii) the integer cosinetransform (ICT) [53]; (iii) the generalized Chen transform9GCT) [54]; and (iv) the binDCT algorithm [55]. The C -matrix transform consists of an approximation for the 8-point DCT transform using the Walsh-Hadamard transformas a pre-processing stage, followed by a conversion matrix ofinteger entries [52]. The integer cosine transform adopts asomewhat different approach. It directly approximates theDCT matrix by integer elements without any pre-processing.The GCT takes advantage of a parametrization of theDCT matrix replacing exact parameter values by rationalapproximations, such as 3/16,3/8,11/16,91/128 [2, p. 211].Being the parameters multiplicatively combined, the GCTscheme can provide a ﬁnal approximate DCT matrix withelements of large integer representation (e.g., 1729/2048 or1183/2048). In its turn, the binDCT employs an approachbased on lifting schemes [9]. Although the individual multi-plicative elements of the binDCT lifting structure are rela-tively small (e.g., 13/16 or 15/16), they are multiplied in cas-cade. The resulting basis vectors that approximate the DCTmatrix possess elements such as 7823/8192 or 3217/4096 [2,p. 229]. So the elements of the ﬁnal effective transformationmatrix have a signiﬁcantly larger dynamic range when com-pared to that of the individual constants employed by GCTparametrization or by the binDCT lifting scheme.Therefore, the ﬁnal approximate quantities could be takeninto consideration when deriving a comparison between ap-proximation methods. Since the accuracy of approximationis closely related to the dynamic range of the utilized inte-ger numbers, a fair comparison of performance could limitthe size of the considered bit representation in the ﬁnal ap-proximate matrix. Table 8 brings a quantitative compari-son of the suggested methodology with the referred alterna-tive methods described in literature. For each method, thedynamic range of the ﬁnal approximation matrix is shownin parenthesis; design parameters are also indicated for theproposed methodology.The low complexity of the proposed 0th order approximateDCT can be of additional practical interest; C (0)8 requiresonly 24 additions. Moreover, its orthogonalizing adjustmentmatrix is a simple diagonal matrix given by: S − = diag µ p p p p p p ¶ . (51)Considering the adjustment offered by S − , the ﬁnal approx-imation offers a MSE of 9 × − . For comparison, severalversions of the ICT -II could only exhibit a similar MSE per-formance, ranging from 6 × − to 8 × − [2, p. 177], at theexpense of using 2- or 3-bit arithmetic.If the proposed nonorthogonal formulation that requires D − is chosen (Equations (20) and (21)), we obtain that( C (0)8 ) − = ( C (0)8 ) H D − , (52)where D − = diag(1/8,1/6,1/4,1/6,1/8,1/6,1/4,1/6). The diag-onal elements are small and impose low computational re-quirements. This can be interpreted as the basis relation forthe deﬁnition of another new square wave transform.Additionally, in any case, when considering the DCT asa pre-processing step for a subsequent coefﬁcient quantiza-tion procedure for image compression, the elements of D − can be included into the quantization step. This procedure issuggested and adopted in several works [32, 33]. As a conse-quence, the computational complexity of the approximationis totally conﬁned to that of C (0)8 .Additionally, Figure 5(a) illustrates the absence of DCleakage of C (0)8 . All frequency response curves vanish atnull frequency, except, of course, the ﬁrst curve, which cor-responds to the moving average ﬁlter associated to the ﬁrstrow of the transformation matrix. Figure 5(b) shows the fre-quency response of the exact DCT for comparison [56]. Using a simple and straightforward approach, we demon-strate that several approximate transforms can be conve-niently obtained. It is important to emphasize that the pro-posed methods are general approaches, encompassing dis-tinct transforms and various blocklengths. This fact con-trasts with the highly specialized procedures archived in lit-erature. Nevertheless, we could still derive meaningful per-formance comparisons.The proposed integer approximations are well suited forarchitectures that take advantage of the dyadic rationalsand canonical signed digit representation. Overall, low com-putational complexities are obtained. Further dedicated op-timization methods could enhance the proposed methods forselected blocklengths and kernels. Possible venues includethe elaboration of specially designed algorithms for the com-putation of particular adjustment matrices. Additionally,new 8-point square wave transforms equipped with perfectreconstruction and meaningful spectra were suggested forthe DFT, DHT, and DCT.

Acknowledgments

This work was partially supported by the Department of For-eign Affairs and International Trade of Canada and the

Con-selho Nacional de Desenvolvimento Cientíﬁco e Tecnológico (CNPq), Brazil.

References [1] W. L. Briggs and V. E. Henson,

The DFT: an owner’s manualfor the discrete Fourier transform . SIAM, 1995.[2] V. Britanak, P. Yip, and K. R. Rao,

Discrete Cosine and SineTransforms . Academic Press, 2007.[3] R. E. Blahut,

Fast Algorithms for Digital Signal Processing .Addison-Wesley, 1985.[4] C. Van Loan,

Computational Frameworks for the Fast FourierTransform . SIAM, 1992.[5] C. S. Burrus and T. Parks,

DFT/FFT and Convolution Algo-rithms . New York: John Wiley & Sons, 1985.[6] S. Winograd,

Arithmetic Complexity of Computations , vol. 33of

CBMS-NSF Regional Conference Series in Applied Mathe-matics . SIAM, 1980. C g η ICT -II (3-bit) 2.7217e − m = β = − m = β = − (4-bit) 3.3003e − -II (4-bit) 2.0607e − m = β = − m = β = − -II (5-bit) 1.3029e − − m = β = − m = β = − (6-bit) 2.2680e − m = β = − m = β = − -II (11-bit) 4.2336e − m = β = − m = β = − Normalized Frequency M a gn it ud e R e s pon s e ( d B ) (a) Normalized Frequency M a gn it ud e R e s pon s e ( d B ) (b) Figure 5: Fourier transform (magnitude) of the basis vectors of (a) C (0)8 and (b) the exact DCT.11

7] M. T. Heideman,

Multiplicative Complexity, Convolution, andthe DFT . Springer-Verlag, 1988.[8] E. Feig and S. Winograd, “On the multiplicative complexity ofdiscrete cosine transforms,”

IEEE Transactions on Informa-tion Theory , vol. 38, pp. 1387–1391, July 1992.[9] J. Liang and T. D. Tran, “Fast multiplierless approximationsof the DCT with the lifting scheme,”

IEEE Transactions onSignal Processing , vol. 49, pp. 3032–3044, Dec. 2001.[10] I. S. Reed, M.-T. Shih, T. K. Truong, E. Hendon, and D. W.Tufts, “A VLSI architecture for simpliﬁed arithmetic Fouriertransform algorithm,”

IEEE Transactions on Signal Process-ing , vol. 40, pp. 1122–1133, May 1992.[11] R. J. Cintra and H. M. Oliveira, “How to interpolate in arith-metic transform algorithms,” in

Proceedings of the IEEE 27thInternational Conference on Acoustics, Speech, and SignalProcessing , vol. 4, (Orlando, FL), p. IV, May 2002.[12] R. J. Cintra and V. S. Dimitrov, “The arithmetic cosine trans-form: Exact and approximate algorithms,”

IEEE Transactionson Signal Processing , vol. 58, no. 6, pp. 3076–3085, 2010.[13] N. Bhatnagar, “A binary friendly algorithm for computing dis-crete Hartley transform,” in

Proceedings of the 13th Inter-national Conference on Digital Signal Processing , (Santorini,Greece), pp. 353–356, July 1997.[14] H. S. Dee and V. Jeoti, “Computing DFT using approxi-mate fast Hartley transform,” in

Proceedings of the Interna-tional Symposium on Signal Processing and its Applications(ISSPA) , (Kuala Lumpur, Malaysia), pp. 100–103, Aug. 2001.[15] N. Merhav and B. Vasudev, “A multiplication-free approxi-mate algorithm for the inverse discrete cosine transform,” in

Proceedings of the 1999 International Conference on ImageProcessing , vol. 2, pp. 759–763, 1999.[16] K. Lengwehasatit and A. Ortega, “DCT computation based onvariable complexity fast approximations,” in

Proceedings ofthe 1998 International Conference on Image Processing , vol. 3,pp. 95–99, Oct. 1998.[17] A. Hossen and U. Heute, “Fast approximate DCT: basic-idea,error analysis, applications,” in

Proceedings of the 1997 IEEEInternational Conference on Acoustics, Speech, and SignalProcessing , vol. 3, pp. 2005–2008, Apr. 1997.[18] B. K. Natarajan and B. Vasudev, “A fast approximate algo-rithm for scaling down digital images in the DCT domain,” in

Proceedings of the International Conference on Image Process-ing , vol. 2, pp. 241–243, Oct. 1995.[19] R. K. W. Chan and M.-C. Lee, “Multiplierless fast DCT algo-rithms with minimal approximation errors,” in

InternationalConference on Pattern Recognition , vol. 3, (Los Alamitos, CA,USA), pp. 921–925, IEEE Computer Society, 2006.[20] H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky,“Low-complexity transform and quantization in H.264/AVC,”

IEEE Transactions on Circuits and Systems for Video Tech-nology , vol. 13, pp. 598–603, July 2003.[21] K. A. Wahid, V. S. Dimitrov, W. Badawy, and G. A. Jullien,“Error-free arithmetic and architecture for H.264,” in

Confer-ence Record of the Thirty-Ninth Asilomar Conference on Sig-nals, Systems and Computers , pp. 703–707, 2005.[22] S. Oraintara, Y.-. J. Chen, and T. Q. Nguyen, “Integer fastFourier transform,”

IEEE Transactions on Signal Processing ,vol. 50, pp. 607–618, Mar. 2002. [23] F. Xu, C.-H. Chang, and C.-C. Jong, “Hamming weight pyra-mid - a new insight into canonical signed digit representa-tion and its applications,”

Computers & Electrical Engineer-ing , vol. 33, no. 3, pp. 195–207, 2007.[24] M. Koyuturk, A. Grama, and N. Ramakrishnan, “Nonorthog-onal decomposition of binary matrices for bounded-error datacompression and analysis,”

ACM Transactions on Mathemati-cal Software , vol. 32, pp. 33–69, Mar. 2006.[25] R. C. French and R. F. Mitchell, “Class of nonorthogonal trans-formations for signal processing,”

Electronics Letters , vol. 10,pp. 78–79, 21 1974.[26] X. Yu, J. Wang, N. K. Loh, G. A. Jullien, and W. C. Miller,“Method for generating a new optimal nonorthogonal base insignal representation,”

Electronics Letters , vol. 28, pp. 2191–2193, Nov. 1992.[27] C. Ding, X. He, and H. D. Simon, “On the equivalence of non-negative matrix factorization and spectral clustering,” in

Pro-ceedings of SIAM Data Mining Conference , pp. 606–610, 2005.[28] E. M. Fadaili, N. T. Moreau, and E. Moreau, “Nonorthogonaljoint diagonalization/zero diagonalization for source separa-tion based on time-frequency distributions,”

IEEE Transac-tions on Signal Processing , vol. 55, pp. 1673–1687, May 2007.[29] W. B. Mikhael and A. P. Berg, “Image representation usingnonorthogonal basis images with adaptive weight optimiza-tion,”

IEEE Signal Processing Letters , vol. 3, pp. 165–167,June 1996.[30] N. J. Higham, “Computing the polar decompositioin—with ap-plications,”

SIAM Journal on Scientiﬁc and Statistical Com-puting , vol. 7, pp. 1160–1174, Oct. 1986.[31] T. I. Haweel, “A new square wave transform based on theDCT,”

Signal Processing , vol. 82, pp. 2309–2319, 2001.[32] S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, “Low-complexity 8 × ElectronicsLetters , vol. 44, pp. 1249–1250, Sept. 2008.[33] K. Lengwehasatit and A. Ortega, “Scalable variable complex-ity approximate forward DCT,”

IEEE Transactions on Circuitsand Systems for Video Technology , vol. 14, pp. 1236–1248,Nov. 2004.[34] S. G. Krantz,

Real Analysis and Foundations . Chapman &Hall/CRC, 2005.[35] G. A. F. Seber,

A Matrix Handbook for Statisticians . JohnWiley & Sons, Inc., 2007.[36] X.-Y. Jing and D. Zhang, “A face and palmprint recognition ap-proach based on discriminant DCT feature extraction,”

IEEETransactions on Systems, Man, and Cybernetics, Part B: Cy-bernetics , vol. 34, pp. 2405–2415, Dec. 2004.[37] P. S. R. Diniz,

Adaptive Filtering: Algorithms and PracticalImplementation . Springer, 3 ed., 2008.[38] J. J. Du Croz and N. J. Higham, “Stability of methods for ma-trix inversion,”

IMA Journal of Numerical Analysis , vol. 12,pp. 1–19, 1992.[39] D. J. Higham, “Condition numbers and their condition num-bers,”

Linear Algebra and its Applications , vol. 214, pp. 193–213, 1995.[40] C. R. Johnson,

Matrix theory and applications . AmericanMathematical Society, 1990.

41] B. K. Moser,

Linear models: a mean model approach . Proba-bility and Mathematical Statistics, Academic Press, 1996.[42] G. H. Golub and C. F. Van Loan,

Matrix Computations . JHUPress, 3 ed., 1996.[43] Y.-W. Lin and C.-Y. Lee, “Design of an FFT/IFFT processor forMIMO OFDM systems,”

IEEE Transactions on Circuits andSystems-I: Regular Papers , vol. 54, pp. 807–815, Apr. 2007.[44] W.-T. Wonga, F. Y. Shihb, and J. Liua, “Shape-based imageretrieval using support vector machines, Fourier descriptorsand self-organizing maps,”

Information Sciences , vol. 177,pp. 1878–1891, Apr. 2007.[45] D. Bellan, “DFT-based detection of sinusoids in real-time ap-plications,” in

Proceedings of the IEEE International Sympo-sium on Intelligent Signal Processing , pp. 104–108, Aug. 2009.[46] G. J. Miao and M. A. Clements,

Digital Signal Processing andStatistical Classiﬁcation . Artech House, 2002.[47] R. N. Bracewell,

The Hartley Transform . Oxford, 1986.[48] R.-H. Park, K. S. Yoon, and W. Y. Choi, “Eight-point discretehartley transform as an edge operator and its interpretationin the frequency domain,”

Pattern Recognition Letters , vol. 19,pp. 569–574, May 1998.[49] N. Roma and L. Sousa, “Efﬁcient hybrid DCT-domain algo-rithm for video spatial downscaling,”

EURASIP Journal onAdvances in Signal Processing , vol. 2007, no. 2, pp. 30–30,2007. [50] S. K. Raikar and A. Makur, “Noise feedback structure fornon-orthogonal transform coding,” in

Proceedings of the 2003IEEE International Conference on Acoustics, Speech, and Sig-nal Processing , vol. 6, pp. 497–500, Apr. 2003.[51] V. K. Goyal, “Theoretical foundations of transform coding,”

IEEE Signal Processing Magazine , vol. 18, pp. 9–21, Sept.2001.[52] H. W. Jones, D. N. Hein, and S. C. Knauer, “The Karhunen-Loève, discrete cosine and related transforms obtained viathe Hadamard transform,” in

Proceedings of the InternationalTelemetering Conference , (Los Angeles, CA), pp. 14–16, Nov.1978.[53] W. K. Cham and Y. T. Chan, “Integer discrete cosine trans-forms,” in

Proceedings of the International Symposium on Sig-nal Processing, Theories, Implementations, and Applications ,(Brisbane, Australia), pp. 674–676, 1987.[54] J. D. Allen and S. M. Blonstein, “The multiply-free Chentransform — a rational approach to JPEG,” in

Proceedings ofthe Picture Coding Symposium , (Tokyo, Japan), pp. 237–240,1991.[55] T. D. Tran, “The binDCT: fast multiplierless approximation ofthe DCT,”

IEEE Signal Processing Letters , vol. 7, pp. 141–144,June 2000.[56] G. Strang, “The discrete cosine transform,”

SIAM Review ,vol. 41, pp. 135–147, Mar. 1999.,vol. 41, pp. 135–147, Mar. 1999.