[PDF] Lossless Image and Intra-frame Compression with Integer-to-Integer DST

Abstract

Video coding standards are primarily designed for efficient lossy compression, but it is also desirable to support efficient lossless compression within video coding standards using small modifications to the lossy coding architecture. A simple approach is to skip transform and quantization, and simply entropy code the prediction residual. However, this approach is inefficient at compression. A more efficient and popular approach is to skip transform and quantization but also process the residual block with DPCM, along the horizontal or vertical direction, prior to entropy coding. This paper explores an alternative approach based on processing the residual block with integer-to-integer (i2i) transforms. I2i transforms can map integer pixels to integer transform coefficients without increasing the dynamic range and can be used for lossless compression. We focus on lossless intra coding and develop novel i2i approximations of the odd type-3 DST (ODST-3). Experimental results with the HEVC reference software show that the developed i2i approximations of the ODST-3 improve lossless intra-frame compression efficiency with respect to HEVC version 2, which uses the popular DPCM method, by an average 2.7% without a significant effect on computational complexity.

Full PDF

11 Lossless Image and Intra-frame Compression withInteger-to-Integer DST

Fatih Kamisli,

Member, IEEE

Abstract —Video coding standards are primarily designed forefﬁcient lossy compression, but it is also desirable to supportefﬁcient lossless compression within video coding standards usingsmall modiﬁcations to the lossy coding architecture. A simpleapproach is to skip transform and quantization, and simplyentropy code the prediction residual. However, this approach isinefﬁcient at compression. A more efﬁcient and popular approachis to skip transform and quantization but also process the residualblock with DPCM, along the horizontal or vertical direction,prior to entropy coding. This paper explores an alternativeapproach based on processing the residual block with integer-to-integer (i2i) transforms. I2i transforms can map integer pixelsto integer transform coefﬁcients without increasing the dynamicrange and can be used for lossless compression. We focus onlossless intra coding and develop novel i2i approximations of theodd type-3 DST (ODST-3). Experimental results with the HEVCreference software show that the developed i2i approximations ofthe ODST-3 improve lossless intra-frame compression efﬁciencywith respect to HEVC version 2, which uses the popular DPCMmethod, by an average 2.7% without a signiﬁcant effect oncomputational complexity.

Index Terms —Image coding, Video Coding, Discrete cosinetransforms, Lossless coding, HEVC

I. I

NTRODUCTION

Video coding standards are primarily designed for efﬁcientlossy compression, but it is also desirable to support efﬁcientlossless compression within video coding standards. However,to avoid increase in the system complexity, lossless compres-sion is typically supported using small modiﬁcations to thelossy coding architecture.Lossy compression in modern video coding standards, suchas HEVC [1] or H.264 [2], is achieved with a block-basedapproach. First, a block of pixels are predicted using pixelseither from a previously coded frame (inter prediction) orfrom previously coded regions of the current frame (intraprediction). The prediction is in many cases not sufﬁcientlyaccurate and in the next step, the block of prediction errorpixels (residual) are computed and then transformed to reduceremaining spatial redundancy. Finally, the transform coefﬁ-cients are quantized and entropy coded together with otherrelevant side information such as prediction modes.To support also lossless compression within the block-basedlossy coding architecture summarized above, the simplestapproach is to just skip the transform and quantization steps,and directly entropy code the prediction residual block. Thisapproach is indeed used in HEVC version 1 [1]. While thisis a simple and low-complexity approach, it is well knownthat prediction residuals are not sufﬁciently decorrelated in

F. Kamisli is with the Department of Electrical and Electronics Engineeringat the Middle East Technical University, Ankara, Turkey. many regions of video sequences and directly entropy coding aprediction residual block is inefﬁcient at compression. Hence,a large number of approaches have been proposed to developmore efﬁcient lossless compression methods for video coding.A more efﬁcient and popular approach is to skip transformand quantization but process the residual block with differen-tial pulse code modulation (DPCM) prior to entropy coding[3], [4]. While there are many variations of this approach [3],[4], [5], [6], video coding standards HEVC and H.264 includethe simple horizontal and vertical DPCM due to their lowcomplexity and reasonable compression performance.This paper explores an alternative approach for losslesscompression within video coding standards. Instead of DPCM,integer-to-integer (i2i) transforms are used to process theresidual block. I2i transforms can map inputs that are on auniform discrete lattice to outputs on the same lattice and areinvertible [7]. In other words, i2i transforms can map integerpixels to integer transform coefﬁcients. Note however thatunlike the integer transforms used in HEVC for lossy coding[8], i2i transforms do not increase the dynamic range at theoutput and can therefore be easily employed in lossless coding.While there are many papers that employ i2i approximationsof the discrete cosine transform (DCT) in lossless imagecompression [9], we could not come across a work whichexplores i2i transforms for lossless compression of predictionresiduals in video coding, or particularly in H.264 or HEVC.This paper focuses on lossless compression for intra coding.For lossless inter coding, some of our preliminary results areprovided in [10]. In lossy intra coding, it is known that a hybridseparable 2D transform based on the odd type-3 discrete sinetransform (ODST-3) and the DCT [11], [12] or simply a2D ODST-3 [1] provides improved compression performanceover the traditional 2D DCT at transform coding block-basedspatial prediction residuals. While the literature includes greatprevious research on i2i DCTs [9], [13], [14], we could notﬁnd any i2i approximations of the ODST-3. Therefore in thispaper, we ﬁrst explore the design of i2i approximations ofthe ODST-3 and then provide lossless intra-frame compressionresults with the developed i2i approximations of the ODST-3.Our experimental results performed using the HEVC referencesoftware indicate that using the developed i2i approximationsof ODST-3, the lossless intra-frame compression of HEVCversion 2, which uses the popular DPCM method along thehorizontal or vertical direction, can be improved by an average . without signiﬁcant complexity increase.The remainder of the paper is organized as follows. InSection II, a brief overview of related previous research onlossless video compression is provided. Section III discussesi2i transforms and their design based on plane rotations and the a r X i v : . [ c s . MM ] A ug lifting scheme. Section IV presents a framework for designingcomputationally efﬁcient i2i approximations of the ODST-3.Section V presents experimental results with the designed i2iapproximations of the ODST-3 within HEVC and comparesthem with those of HEVC version 1 and 2. Finally, SectionVI concludes the paper. Note that some preliminary results ofthis work were presented in [10], [15].II. P REVIOUS R ESEARCH ON L OSSLESS V IDEO C OMPRESSION

One of the simplest methods to support lossless compressionwithin video codecs primarily designed for lossy coding is tojust skip the transform and quantization steps, and directlyentropy code the prediction residual block. This approach isindeed used in HEVC version 1 [1]. While this is a low-complexity approach, it is inefﬁcient at compression sinceprediction residuals are typically not well decorrelated. Hence,a large number of approaches have been proposed to developmore efﬁcient lossless compression methods for video coding.These approaches can be categorized into three groups, whichwe brieﬂy review as follows.

A. Methods based on residual DPCM

The ﬁrst group of methods are based on processing theresidual blocks, obtained from the block-based spatial ortemporal prediction of the lossy coding architecture, withdifferential pulse code modulation (DPCM) prior to entropycoding and are typically called residual DPCM (RDPCM)methods. There are many variations of RDPCM methods in theliterature for both lossless intra and inter coding [3], [4], [5],[6]. RDPCM methods process the prediction residual blockwith some speciﬁc pixel-by-pixel prediction method, which istypically the distinguishing feature among the many RDPCMmethods.One of the earliest RDPCM methods was proposed in [3]for lossless intra coding in H.264. Here, after the block-based spatial prediction is performed, a simple pixel-by-pixel differencing operation is applied on the residual pixelsin only horizontal and vertical intra prediction modes. Inthe horizontal intra mode, from each residual pixel, its leftneighbor is subtracted and the result is the RDPCM pixelof the block. Similar differencing is performed along thevertical direction in the vertical intra mode. Note that theresiduals of other angular intra modes are not processed in[3] because directional pixel-by-pixel prediction with differentinterpolation for each angular prediction mode is required toaccount for the directional correlation of the residuals and theadditional improvement in compression does not justify thecomplexity increase.The same RDPCM method as in [3] is now included inHEVC version 2 [16], [17] for intra and inter coding. Ininter coding, RDPCM is applied either along the horizontalor vertical direction or not at all, and a ﬂag is coded in eachtransform unit (TU) to indicate if it is applied, and if so,another ﬂag is coded to indicate the direction. In intra coding,RDPCM is applied only when intra prediction mode is eitherhorizontal or vertical and no ﬂag is coded since the RDPCMdirection is inferred from the intra prediction mode.

B. Methods based on pixel-by-pixel spatial prediction

The second group of methods can be used only in losslessintra coding and are based on replacing the block-based spatialprediction method with a pixel-by-pixel spatial predictionmethod. Since the transform is skipped in lossless coding, apixel-by-pixel spatial prediction approach can be used insteadof block-based prediction for more efﬁcient prediction.The literature contains many lossless intra coding methodsbased on the pixel-by-pixel prediction approach [18], [19],[20]. The so-called Sample-based Angular Prediction (SAP)method is a well-known such method [18]. In the applicationof the SAP method to HEVC [18], only the angular intramodes are modiﬁed and the DC and planar intra modes remainunmodiﬁed. In these modiﬁed angular intra modes, the sameangular projection directions and linear interpolation equationsof HEVC’s intra prediction are used, but the used referencesamples are modiﬁed. Instead of the the block neighbor pixels,the immediate neighbor pixels are used as reference pixels forprediction, resulting in a pixel-by-pixel prediction version ofHEVC’s block-based intra prediction.Instead of using the HEVC intra prediction equations forpixel-by-pixel spatial prediction, a more general pixel-by-pixel spatial prediction method based on using 3 neighboringpixels in each intra mode of HEVC is developed in [21],and the results report one of the best lossless intra codingperformances within HEVC.While the lossless intra coding methods based on pixel-by-pixel spatial prediction can provide competitive compressionperformance, their distinguishing feature can also be a draw-back. Their pixel-based nature is not congruent with the block-based architecture of video coding standards and introducesundesired pixel-based dependencies in the prediction architec-ture that can reduce throughput in the processing pipeline ofvideo encoders and decoders [18], [21].

C. Methods based on modiﬁed entropy coding

The third group of methods considers entropy coding. Inlossy coding, transform coefﬁcients of prediction residualsare entropy coded, while in lossless coding, the predictionresiduals are entropy coded. Considering the difference of thestatistics of quantized transform coefﬁcients and predictionresiduals, several modiﬁcations in entropy coding were pro-posed for lossless coding [22], [23], [24]. The HEVC version2 includes reversing the scan order of coefﬁcients, using adedicated context model for the signiﬁcance map and othertools [16], [25].III. I

NTEGER - TO - INTEGER ( I I ) TRANSFORMS

Integer-to-integer (i2i) transforms map integer inputs tointeger outputs and are invertible [7]. Note that unlike theinteger transforms in HEVC [8], which also map integerresidual pixels to integer transform coefﬁcients by implement-ing the transform operations with ﬁxed-point arithmetic, i2itransforms considered here do not increase the dynamic rangeat the output. Therefore they can be easily used in losslesscompression. + cos (α) + x x y y cos (α) sin (α)− sin (α) (a) Plane rotation + + q + qr x x y y (b) Decomposition with three lifting steps + + q + qr x x y y - - - (c) Inverse structureFig. 1. (a) Plane rotation and (b) its decomposition into a structure withthree lifting steps and (c) the inverse structure. One possible method to obtain an i2i transform is todecompose a known orthogonal transform into a cascade ofplane rotations, and then approximate each plane rotation witha lifting structure [7], [26], which can map integer inputs tointeger outputs.

A. Plane rotations and the lifting scheme

A plane rotation can be represented with the 2x2 matrixgiven below in Equation (1) and also shown with a ﬂow-graphin Figure 1 (a). P ( α ) = (cid:20) cos( α ) sin( α ) − sin( α ) cos( α ) (cid:21) (1)The signiﬁcance of plane rotations comes from the capabilityto design orthogonal transforms by cascading multiple planerotations.A plane rotation can be decomposed into a structure withthree lifting steps or a structure with two lifting steps andtwo scaling factors [9]. Consider ﬁrst the decomposition intoa structure with three lifting steps as shown in Figure 1 (b),which is represented in matrix form as (cid:20) cos( α ) sin( α ) − sin( α ) cos( α ) (cid:21) = (cid:20) q (cid:21) (cid:20) r (cid:21) (cid:20) q (cid:21) (2)where q = cos( α ) − α ) and r = sin( α ) .Each lifting step can be inverted with another lifting stepbecause (cid:20) q (cid:21) − = (cid:20) − q (cid:21) , (cid:20) r (cid:21) − = (cid:20) − r (cid:21) . (3)In other words, each lifting step is inverted by subtracting outwhat was added in the forward lifting step. Thus, the inverseof the decomposition structure with 3 lifting steps is obtainedby cascading the same lifting steps with subtraction instead ofaddition in reverse order, as shown in Figure 1 (c).Consider now the decomposition of a plane rotation into astructure with two lifting steps and two scaling factors. Thereare four such possible decompositions, as shown in Figure 2.Note that the type-3 and type-4 decompositions in Figure 2(d) and (e) have permuted outputs. In other words, output y + cos (α) + x x y y cos (α) sin (α)− sin (α) (a) Plane rotation + + p u K K x x y y (b) Type-1 decomposition with 2lifting steps and 2 scaling factors + + p u K K x x y y (c) Type-2 decomposition with 2lifting steps and 2 scaling factors + + p u K K x x y y (d) Type-3 decomposition with 2lifting steps and 2 scaling factors + + p u K K x x y y (e) Type-4 decomposition with 2lifting steps and 2 scaling factorsFig. 2. (a) Plane rotation and its decomposition into structures with twolifting steps and two scaling factors. There are four possible decompositionsas shown in (b), (c), (d) and (e). The decomposition in (d) and (e) havepermuted outputs. (and scaling factor K ) is now in the upper branch and output y (and scaling factor K ) in the lower.These decompositions can also be represented in matrixform. For example, the decomposition in Figure 2 (b) canbe represented as in Equation (4) below. (cid:20) cos( α ) sin( α ) − sin( α ) cos( α ) (cid:21) = (cid:20) K K (cid:21) (cid:20) u (cid:21) (cid:20) p (cid:21) (4)The lifting parameters p and u and the scaling factors K and K in all four types of decompositions can be relatedto the rotation angle α of the plane rotation by ﬁrst writingthe linear equations relating the inputs to the outputs for thedecompositions and the plane rotation and then equalizing thelinear equations. This results in the following relations.For the type-1 decomposition in Figure 2 (b), the lifting andscaling parameters are related to rotation angle α as follows : • p = tan( α ) , u = − sin( α ) cos( α ) • K = cos( α ) , K = α ) .For the type-2 decomposition in Figure 2 (c), the relationsare as follows : • p = − tan( α ) , u = sin( α ) cos( α ) • K = α ) , K = cos( α ) .For the type-3 decomposition in Figure 2 (d), the relationsare as follows : • p = − α ) , u = sin( α ) cos( α ) • K = − sin( α ) , K = α ) .Finally, for the type-4 decomposition in Figure 2 (e), thelifting and scaling parameters are related to rotation angle α as follows : • p = α ) , u = − sin( α ) cos( α ) • K = − α ) , K = sin( α ) .Note that all for types of decomposition structures in Figure2 are equivalent with the above parameters, i.e. they have thesame input-output relation.Note also that all four types of decompositions are equiv-alent to the plane rotation in Figure 2 (a), i.e. they havethe same input-output relation, except that type-3 and type-4decompositions have permuted outputs, which is just a simplereordering of the output signal. However, when designing i2itransforms, the lifting parameters p and u can be quantizedand the scaling factors K and K can become important, andtherefore one type of decomposition can be preferred over theothers despite all having the same input-output relation. Thisissue will be discussed in more detail in Section IV-D wherewe discuss the design of i2i approximation of the odd type-3DST (ODST-3) based on lifting decompositions of cascadedplane rotations.Inversion of decompositions with two lifting steps andtwo scaling factors can be achieved by going in the reversedirection and inverting ﬁrst the scaling factors and then thelifting steps. B. Integer-to-integer mapping property

Consider now the integer-to-integer mapping property of thelifting steps. In all of the above decompositions, each liftingstep can map integers to integers by introducing a simplerounding operation. If the result of multiplying integer inputsamples with lifting paramters p or u is rounded to integers,each lifting step performs mapping from integer inputs tointeger outputs [7], [9]. Notice that as long as the samerounding operation is applied in both forward and inverselifting steps, inversion of a lifting step remains the same,i.e. subtract what was added in the forward lifting step. Insummary, each lifting step can map integers to integers (andis still easily inverted) by introducing rounding operations aftermultiplications with lifting parameters p or u .The scaling factors in the decompositions in Figure 2 violateinteger-to-integer mapping property if scaling factors are notintegers. If they are integers, they just introduce artiﬁcialscaling that is unnecessary. Thus scaling factors seem to pose aproblem for integer-to-integer mapping property of the liftingdecompositions in Figure 2, however, we discuss in SectionIV-D how to deal with scaling factors when designing i2itransforms from cascaded lifting decompositions.Floating point multiplications can be avoided in liftingsteps if the lifting parameters p and u are approximated withrationals of the form k/ l ( k and l are integers), which can beimplemented with only integer addition and bitshift operations(integer multiplications can be performed with addition andbitshift). Note that the bitshift operation implicitly includesa rounding operation, which provides integer-to-integer map-ping, as discussed above. Integers k and l can be chosendepending on the desired accuracy to approximate the liftingoperation and the desired level of computational complexity. √ ( π/ )− √ ( π/ ) √ ( π/ ) √ ( π/ ) -++ ++ ++++- - r[0]r[1]r[2]r[3] R[0]R[2]R[1]R[3] Fig. 3. Factorization of 4-point DCT.

C. I2i DCT

A signiﬁcant amount of work on i2i transforms has beendone to develop i2i approximations of the discrete cosinetransform (DCT). One of the most popular methods, due its tolower computational complexity, is to utilize the factorizationof the DCT into plane rotations and butterﬂy structures [27],[28], [9]. Two well-known factorizations of the DCT intoplane rotations and butterﬂies are the Chen’s and Loefﬂer’sfactorizations [27], [28]. Loefﬂer’s 4-point DCT factorizationis shown in Figure 3. It contains three butterﬂies, one planerotation and a scaling factor of at the end of each branch.Consider ﬁrst the three butterﬂy structures shown in Figure3. A butterﬂy structure maps integers to integers because theoutput samples are the sum and difference of the inputs. It isalso easily inverted by itself followed by division of outputsamples by 2.The plane rotation in Figure 3 can be decomposed intothree lifting steps or two lifting steps and two scaling factors,as discussed in Section III-A, to obtain integer-to-integermapping. Using two lifting steps reduces the complexity andthe two scaling factors can be combined with the other scalingfactors at the output.The scaling factors at the output can be absorbed intothe quantization stage in lossy coding. In lossless coding, allscaling factors can be omitted. However, care is needed whenomitting scaling factors since for some branches, the dynamicrange of the output may become too high when scaling factorsare omitted. For example, in Figure 3, the DC output sample(i.e. R [0] ) becomes the sum of all input samples when scalingfactors are omitted, however, it may be preferable that it is theaverage of all input samples, which can improve the entropycoding performance [9]. Hence, to obtain an i2i DCT for usein lossless coding, the butterﬂies of Figure 3 are replacedwith lifting steps to adjust the dynamic range at the outputof each branch (or equivalently to adjust the norm of eachanalysis basis function) and the scaling factors at the outputare omitted, resulting in the i2i DCT shown in Figure 4 [9].IV. I NTEGER - TO - INTEGER A PPROXIMATION OF O DD T YPE -3 DSTTo the best of our knowledge, an integer-to-integer (i2i) ap-proximation of the odd type-3 DST (ODST-3) has not appearedin the literature. To develop such an i2i approximation of theODST-3, we ﬁrst approximate the ODST-3 with a cascade ofplane rotations, and approximate these rotations with lifting ++ - r[0]r[1]r[2]r[3] R[0]R[2]R[3]R[1] ½ ++- ½ ++- ½ - - -+ p - u -+ Fig. 4. Lifting-based i2i approximation of DCT for lossless compression. steps to obtain i2i approximations of the ODST-3 for use inlossless intra-frame coding.An overview of this section is as follows. In Section IV-A,the auto-correlation expression of the block-based spatialprediction residual and its optimal transform as the correlationcoefﬁcient approaches 1, i.e. the ODST-3, are reviewed. Next,in Section IV-B, a coding gain expression is presented. InSection IV-C, an algorithm to approximate the 4-point ODST-3 through plane rotations is presented. In Section IV-D, theplane rotation based approximation is used to obtain i2iapproximations of the 4-point ODST-3. Finally, in SectionIV-E, i2i approximations of ODST-3 for large block sizes arediscussed.

A. Block-based spatial prediction, auto-correlation of itsresidual and the odd type-3 DST (ODST-3)

Block-based spatial prediction, or also commonly calledintra prediction, is a widely used technique for predictivecoding of intra-frames in modern video coding standards [2],[29]. In this well-known method, a block of pixels are pre-dicted by copying the block’s spatially neighbor pixels (whichreside in the previously reconstructed left and upper blocks)along a predeﬁned direction inside the block [29]. WhileH.264 supports 8 such directional intra prediction modes (eachcopying spatial neighbors along different directions) in 4x4and 8x8 blocks, HEVC supports 33 such modes (shown inFigure 5) for blocks of sizes 4x4, 8x8, 16x16 and 32x32.The prediction residual block, obtained by subtracting theprediction block from the original block, is transformed andquantized in lossy coding or processed with DPCM in losslesscoding in these standards, prior to entropy coding.The optimal transform for the lossy coding of the spa-tial prediction residual block was determined as the hybridDCT/ODST-3 based on modeling the image pixels with a ﬁrst-order Markov process [11], [12]. Depending on the copyingdirection of the prediction mode, the DCT or the ODST-3is applied in either the horizontal and/or vertical directionforming a hybrid 2D transform. In particular, if the copyingdirection of the prediction mode is horizontal, the ODST-3 isapplied along the horizontal direction and the DCT is appliedalong the vertical direction. Similarly, if the copying directionof the prediction mode is vertical, the ODST-3 is applied alongthe vertical and the DCT along the horizontal direction.Note that although a mode-dependent hybrid transformapproach was derived in [11], compression experiments have . ...... .. . Fig. 5. Copying directions of intra prediction modes in HEVC. Modes 2-34are angular copying modes with the above shown directions and modes 0 and1 are non-angular DC and planar prediction modes, respectively [29]. u ( ) u ( ) u ( ) u ( ) u ( ) Fig. 6. A 4-pixel image row ( white pixels u ( i ) , i = 1 , .., ) and its neighborpixel ( gray pixel u (0) ) modeled with a ﬁrst-order Markov process. The spatialprediction pixels ( ˆ u ( i ) , i = 1 , .., ) of the block are obtained by copying theblock neighbor pixel u (0) , in other words, ˆ u ( i ) = u (0) , i = 1 , .., . shown that using the 2D ODST-3 for all intra modes givessimilar compression performance in lossy coding in HEVC,and the HEVC standard uses 2D ODST-3 for all 4x4 intramodes [1]. Based on this result, we also use i2i approximationsof 2D ODST-3 for all intra modes in our experiments inSection V.Now, we brieﬂy review the derivation of the auto-correlationof the block-based spatial prediction residual because it willbe used to develop i2i transforms that approximate the ODST-3 for lossless intra-frame compression. We use a 1D signalin our discussion for simplicity and because the result can beused for 2D signals by constructing separable 2D transformsas in [11], [12].A ﬁrst-order Markov process, which is used to model imagepixels horizontally within a row (as shown in Figure 6) orvertically within a column, is represented recursively as u ( i ) = ρ · u ( i −

1) + w ( i ) (5)where ρ is the correlation coefﬁcient, u ( i ) are zero-mean,unit variance process samples and w ( i ) are zero-mean, whitenoise samples with variance − ρ . The auto-covariance orcorrelation of the process is given by E [ u ( i ) · u ( j )] = ρ | i − j | . (6)It is well known that the Discrete Cosine Transform (DCT) isthe optimal transform for the ﬁrst-order Markov process as itscorrelation coefﬁcient ρ approaches the value 1 [30].The spatial prediction block is obtained by copying theneighbor pixel of the block, i.e. u (0) , inside the block. In otherwords, the spatial prediction pixels ˆ u ( i ) = u (0) , i = 1 , .., N ,where N is the block length.The residual block pixels r ( i ) , i = 1 , .., N , are obtained bysubtracting the spatial prediction pixels ˆ u ( i ) from the originalpixels u ( i ) : r ( i ) = u ( i ) − ˆ u ( i )= u ( i ) − u (0) . (7) The auto-correlation of the residual pixels is given by E [ r ( i ) r ( j )] and is obtained as follows : E [ r ( i ) r ( j )] = E [( u ( i ) − u (0))( u ( j ) − u (0))]= ρ | i − j | − ρ i − ρ j + 1 , i, j ∈ { , ..., N } (8)Such an auto-correlation expression results in a special auto-correlation matrix as the correlation coefﬁcient ρ approaches1. In particular, for a block size of N = 4 , the followingcorrelation matrix K is obtained : K =   . (9)The eigenvectors of such correlation matrices have been deter-mined to be the basis vectors of the odd type-3 discrete sinetransform (ODST-3) given by [11], [31], [12] [ S ] m,n = 2 √ N + 1 sin ( (2 m − nπ N + 1 ) , m, n ∈ { , ..., N } (10)where m and n are integers representing the frequency andtime index of the basis functions, respectively. Hence, theoptimal transform for the spatial prediction residual block isthe ODST-3, as ρ approaches 1.An important observation regarding the ODST-3 is that itsﬁrst ( m = 1 ) and most important basis function has smallervalues at the beginning (i.e. closer to the prediction boundary)and larger values towards the end of the block. This trendin the values of the basis function is due to the fact thatblock pixels closer to the prediction boundary are predictedbetter than those further away from it, i.e. the variance of theprediction residual signal samples grows with the distance ofthe samples from the prediction boundary [11], [31], [12]. B. Coding gain in lossy and lossless transform coding

In lossy transform coding, the transform design problem re-duces to searching for an orthogonal transform that minimizesthe product of the transform coefﬁcient variances [32]. Theoptimal solution, i.e. transform, is given by the eigenvectorsof the source correlation matrix, and the most commonlyused name for this transform is the Karhunen-Loeve transform(KLT). Based on the transform design problem, a ﬁgure ofmerit called the coding gain G of an orthogonal transform T is deﬁned in the literature as follows : G ( T, K N ) = 10 log ( (cid:81) Ni =1 σ r,i ) N ( (cid:81) Ni =1 σ R,i ) N (11)Here, N is the block length of the signal r ( i ) , i = 1 , ..., N , K N is the correlation matrix of the signal with diagonals σ r,i , i.e. σ r,i is the variance of the i th input sample, σ R,i is the variance of the i th transform coefﬁcient, i.e. i th outputsample. Note that this coding gain expression is obtained underassumptions such as Gaussian source, high-rate quantizationand optimal bit allocation [32].In this paper, we are primarily interested in lossless coding,in particular with integer-to-integer (i2i) transforms. Goyal shows in [7] that under similar assumptions such as Gaussiansource and optimal bit allocation, the i2i transform designproblem for lossless coding reduces to a similar search for atransform that minimizes, again, the product of the transformcoefﬁcient variances, but the search is over all transforms witha determinant of 1 (instead of over orthogonal transforms as inlossy transform coding.) Since we construct i2i transforms inthis paper from cascaded lifting steps, all of the i2i transformsin this paper have a determinant of 1 (since each pair of p and u lifting steps has a determinant of 1). Hence, in this paperwe use the same coding gain expression in Equation (11) todesign and evaluate performances of also i2i transforms to beused for lossless compression.Notice that the search in the i2i transform design problemis over all transforms with a determinant of 1, instead ofover all orthogonal transforms as in transform design for lossytransform coding [7]. Since all orthogonal transforms have adeterminant of 1, the search in the i2i transform design is overa larger set of transforms and thus the coding gain obtainedwith i2i transforms can be larger than that of the KLT, i.e. themaximum obtainable with orthogonal transforms [7].In summary, one of the most important metrics of a trans-form used in compression applications is the coding gain. Atransform with higher coding gain can achieve higher compres-sion performance (provided following processing stages suchas quantization – if present – and entropy coding are performedproperly.) In this paper, we use the coding gain expression inEquation (11) to design i2i transforms for lossless compressionand to evaluate/compare performances of various transforms. C. Approximation of 4-point odd type-3 DST (ODST-3)through plane rotations

While the widely used DCT has computationally efﬁcientfactorizations based on butterﬂy structured implementations[27], [28], [9], such exact factorizations of the odd type-3DST (ODST-3) do not exist. This is because the denominator N + 1 of the ODST-3’s basis function in Equation (10) is nota composite number (i.e. can not be decomposed into productof small integers), in particular, not a power of 2 [33].While exact factorizations based on butterﬂies and planerotations are not possible for the ODST-3, it is still possibleto seek approximations of the transform by cascading planerotations. In this section, we discuss a general framework forsuch approximations and measure the approximation accuracyvia the coding gain, deﬁned in Equation (11).A plane rotation with an angle α that processes the i th and j th branches of a length N signal can be represented with thefollowing N x N matrix : P ( i, j, α ) =  · · · · · · · · · ... . . . ... ... ... · · · cos α · · · sin α · · · ... ... . . . ... ... · · · − sin α · · · cos α · · · ... ... ... . . . ... · · · · · · · · ·  (12) where the four sinusoidal terms appear at the intersections ofthe i th and j th rows and columns. In particular, the non-zeroelements of P ( i, j, α ) are given by : [ P ( i, j, α )] i,i = cosα [ P ( i, j, α )] j,j = cosα [ P ( i, j, α )] j,i = − sinα [ P ( i, j, α )] i,j = sinα [ P ( i, j, α )] k,k = 1 , k (cid:54) = i, j. (13)When cascading plane rotations, the degrees of freedom foreach plane rotation P ( i, j, α ) are the pair of branches ( i , j )to process with the plane rotation, and the rotation angle α .Hence, in cascading plane rotations to approximate the ODST-3, the problem reduces to ﬁnding a given number L of orderedbranch-pairs ( i k , j k ) and rotation angles α k so that the codinggain of the cascaded plane rotations, i.e. the obtained overalltransform Π Lk =1 P ( i k , j k , α k ) , is maximized for a block-basedspatial prediction residual signal r ( i ) , i ∈ { , ..., N } , with cor-relation matrix K N whose entries are given by the correlationexpression in Equation (8). This problem can be formalizedas the following optimization problem : max i ,j ,α ,...,i L ,j L ,α L G ( Π Lk =1 P ( i k , j k , α k ) , K N ) (14)subject to i k (cid:54) = j k , α k ∈ [0 , π/ . This optimization problem does not have a simple solution.The optimization parameters i k and j k , k ∈ { , ..., L } arediscrete and each of them takes an integer value from the set { , ..., N } . Thus the search space for all the discrete optimiza-tion parameters ( i , j , i , j , ..., i L , j L ) contains about (cid:0) N (cid:1) L points since there are about (cid:0) N (cid:1) L many ways to choose the L ordered branch-pairs to which cascaded plane rotations can beapplied. The optimization function G does not have any specialproperties over this discrete search space and each point in itmust be exhaustively searched. For each search point, i.e. eachpossible ordered branch-pair, the rotation angles ( α , ..., α L ) need to be searched, too, to ﬁnd the maximum of the optimiza-tion function G , i.e. overall coding gain. In summary, to ﬁnd anoptimal or near-optimal solution to the optimization problemin Equation (14), one needs to exhaustively search the space ofthe discrete optimization parameters ( i , j , i , j , ..., i L , j L ),and for each point in the search space, the rotation angles ( α , ..., α L ) can be searched by employing a gradient-descenttype algorithm.As block size N increases, the described solution approachbecomes quickly computationally unmanageable. The numberof points (cid:0) N (cid:1) L in the search space of the discrete parametersgrows quickly with N . In particular, assuming a total of L = N log N plane rotations (i.e. similar number of rotationsas in an N-point FFT [34]) the total number of search points (cid:0) N (cid:1) L (cid:39) ( N ) N log N . For a block size of N = 4 , thiscorresponds to about search points, which is manageable,however, for a block size of N = 8 , the number of searchpoints becomes about , which is too large. Hence forblock sizes larger than N = 4 , a different approach isrequired. A possible approach is to use a faster but sub-optimal greedy algorithm, as in [35], to solve the optimization TABLE IT

HEORETICAL CODING GAINS ( IN D B) OF VARIOUS ORTHOGONALTRANSFORMS RELATIVE TO THAT OF THE

KLT,

ALL APPLIED TO THEBLOCK - BASED SPATIAL PREDICTION RESIDUAL WITH A BLOCK SIZE OF N = 4 AND CORRELATION PARAMETER ρ = 0 . .DCT ODST-3 AODST-3 (2) AODST-3 (3)

AODST-3 (4)

AODST-3 (5) -0.6211 -0.0009 -0.7593 -0.1023 -0.0059 -0.0001 problem in a stage-by-stage manner. In each stage, only onerotation P ( i k , j k , α k ) is considered and its coding gain ismaximized by using the output signal of the previous stage asthe input. However, such a greedy approach provides solutionswith signiﬁcantly lower coding gains than the KLT in ourimplementation results. An alternative approach is to use theeven type-3 DST (EDST-3) [31], which can be factored intoa cascade of plane rotations [36], as an approximation to theODST-3 [33]. We pursue the latter approach for designingi2i transforms for lossless compression of block-based spatialprediction residuals with large blocks and discuss this topicfurther in Section IV-E. In this section, we continue ourdiscussion for a block size of N = 4 .Hence, for a spatial prediction residual block of size N = 4 and a correlation parameter of ρ = 0 . , we solvethe optimization problem in Equation (14) with the abovedescribed solution approach. In particular, we exhaustivelysearch the space of the discrete optimization parameters( i , j , i , j , ..., i L , j L ), and for each point in the searchspace, we search for the best rotation angles ( α , ..., α L ) byemploying the optimization toolbox of Matlab. We obtain thesolutions for different number of total plane rotations L . Thecoding gains calculated from Equation (11) of the resultingapproximations, along with other common transforms, areshown in Table I.The results in Table I are given in terms of coding gainrelative to that of the optimal transform, KLT, which achievesa coding gain of . dB. The DCT has, as expected, a bigcoding gain loss of . dB. The ODST-3 has a coding gainloss of only . dB, since it is optimal as ρ approaches1. The remaining transforms AODST-3 ( L ) in Table I representthe obtained approximations to the ODST-3 with L cascadedplane rotations. Their coding gain losses are . dB with2 cascaded plane rotations, and drop to . dB and . dB with 3 and 4 cascaded plane rotations, respectively. With5 plane rotations, the coding gain loss is only . dB.The approximation with four cascaded plane rotations,AODST-3 (4) , is shown in Figure 7 with the branch pairs androtation angles of each plane rotation. The output branches arelabeled according to their variances, i.e. R [0] has the largestvariance and R [3] the smallest. AODST-3 (4) has a very smallcoding gain loss relative to the KLT and also uses the samenumber of rotations as the factorization of DCT in Figure 3.Hence, we focus on AODST-3 (4) in the next section to designi2i approximations of the 4-point ODST-3.Note that the coding gains for the AODST-3 ( L ) we havelisted in Table I are the best coding gains we obtainedfrom our optimization problem using our described solutionapproach. However, we observed from our solution approachthat there are also other near-optimal solutions, i.e. cascaded α = ˚ ++ ++ ++++ r[0]r[1]r[2]r[3] R[0]R[2]R[3]R[1] α = ˚ α = ˚ α = ˚ Fig. 7. AODST-3 (4) , the obtained cascade of 4 plane rotations to approximatethe 4-point ODST-3. The output branches are labeled according to theirvariances, i.e. R [0] has the largest variance and R [3] the smallest. plane rotations that have very close coding gains to the onesin Table I.Note also that the obtained approximations with smallernumber of rotations are not necessarily preﬁxes of the oneswith more rotations. For example, AODST-3 (3) is not equiva-lent to the cascade of the ﬁrst three plane rotations in Figure7. In particular, AODST-3 (3) has both different branch-pairsand rotation angles than the ﬁrst three rotations in Figure 7.Finally, note that the plane rotations in the obtained AODST-3 ( L ) can, in general, not be applied in parallel unlike in theDCT factorization in Figure 3, where the ﬁrst two and lasttwo rotations can be performed in parallel. Of course, oursolution to the optimization problem can be modiﬁed so thatonly ordered branch pairs that can be implemented in parallelare used in the search. In this case, the best transform with atotal of L = 4 rotations becomes the one with ordered branch-pairs of (2,4), (1,3), (3,4) and (1,2), and achieves a coding gainof - . dB relative to the KLT. D. I2i approximation of 4-point odd type-3 DST (ODST-3)

This section discusses the design of integer-to-integer (i2i)transforms that approximate the 4-point odd type-3 DST(ODST-3) based on the approximations AODST-3 ( L ) we ob-tained in the previous section. Although the design approachis general and can be applied to any transform obtained fromcascaded plane rotations, we focus on the AODST-3 ( L ) andprovide examples based on AODST-3 (4) . An overview of theremainder of this section is as follows. We ﬁrst provide asummary of our design approach. Then we discuss how thedesign approach was developed. Finally, we provide theoreticalcoding gains for lossless compression with the obtained i2iapproximations of the ODST-3.

1) Summary of design approach:

Our approach to design-ing i2i approximations of the ODST-3 can be summarized in3 steps as follows.Step 1 : Given any AODST-3 ( L ) , ﬁrst, each plane rotationis replaced with one of four possible decompositionsinto two lifting steps and two scaling factors (shown inFigure 2). For AODST-3 (4) in Figure 7, a possible result is shown in Figure 8 .Step 2 : Next, the scaling parameters K and K of eachdecomposition are commuted with the lifting structuresof the following decompositions so that all scalingfactors are pushed to the end of each signal branch. (Thiscommutative property is discussed in Figure 9.) ForAODST-3 (4) with the lifting decompositions in Figure8, the result of this step is shown in Figure 10.Step 3 : Finally, multiple scaling factors K m,n at the end ofeach signal branch are combined into one scaling factor B i per branch, and the updated parameters ˜ p and ˜ u of thelifting structures are quantized for approximation withrationals of the form k/ l ( k and l are integers) so thatmultiplications with them and following rounding op-erations can be implemented with only integer additionand bit-shift operations. A possible result of this stepapplied to Figure 10 is shown in Figure 11.The resulting i2i approximation of the 4-point ODST-3 inFigure 11 consists of several cascaded lifting structures, withquantized lifting parameters ˆ p m and ˆ u m followed by scalingfactors B i at the end of each branch. The cascaded liftingstructures provide an i2i transform and the scaling factors B i at the end can be absorbed into the quantization stage in lossycoding, and omitted in lossless coding.

2) Development of the design approach:

Our 3-step designapproach described above was developed using the followingobservations.Consider ﬁrst the following observations regarding Step 1. Aplane rotation has equivalent input-output relation with all fourtypes of lifting decompositions into two lifting steps and twoscaling factors (see Figure 2), as discussed in Section III-A.This implies that a plane rotation can be replaced with any ofthe four types of lifting decompositions. However, when thelifting parameters are quantized, then the input-output relationof the decompositions deviates from that of the plane rotation,and each type of decomposition may incur different quantiza-tion error and different deviation. In addition, although all fourtypes of decompositions have equivalent input-output relation,their scaling factors K , K are different, which can becomeimportant after steps 2 and 3 are performed, as discussed insub-section IV-D3. Thus, the type of decomposition used foreach rotation is important and sub-section IV-D3 discusseshow to choose the type for each rotation.Note that type-3 and type-4 lifting decompositions havepermuted outputs (see Figure 2), which means that when atype-3 or type-4 lifting decomposition is used to replace aplane rotation, then the output signals in the lower and upperbranches are swapped. Hence, when replacing plane rotationsin an AODST-3 ( L ) (e.g. Figure 7), the swapping of signals inbranches has to be kept track of so that the correct branchesare connected in the following lifting decompositions. For Note that the second and fourth rotations of AODST-3 (4) in Figure7 connect branches 2 to 4 and 1 to 2, respectively. However, the liftingdecompositions of the second and fourth rotations in Figure 8 connectbranches 2 to 1 and 4 to 3, respectively. This discrepancy comes from usinga type-3 or type-4 decomposition, which have permuted outputs, for one ormore preceding plane rotations. This issue is discussed in more detail in sub-section IV-D2. α = ˚ + r[0]r[1]r[2]r[3] R[1]R[3]R[2]R[0] α = ˚ α = ˚ α = ˚ K = − α K = sin α u =− sin α cos α p = α p =− tan α u = sin α cos α K = cos α + u = sin α cos α K =− sin α K = α p = − α ++ K = α + p =− tan α u = sin α cos α K = cos α + K = α ++ Fig. 8. Each plane rotation of the AODST-3 (4) in Figure 7 is replaced with one of four types of decompositions into two lifting steps and two scalingfactors. When a type-3 or type-4 decomposition is used, then the output signal of that decomposition is permuted, which needs to be taken into account forthe branches to pair in the following decompositions. The used types for each plane rotation are type-4, type-2, type-3 and type-2, respectively, in this ﬁgure. + p K b K a u K a K b + K b K a + p u + K a K b Fig. 9. The order of scaling factors K a , K b and a following lifting structurecan be changed so that the scaling factors K a , K b follow a lifting structurewith modiﬁed lifting parameters. Note that the above two structures are end-to-end equivalent, i.e. they have the same input-output relation. example, the ﬁrst plane rotation in Figure 7 is replaced witha type-4 lifting decomposition in Figure 8, which means thatthe output signal of the ﬁrst plane rotation in the top (bottom)branch in Figure 7 is in the bottom (top) branch in Figure 8.Hence, although the second plane rotation in Figure 7 connectsbranches 2 to 4, the second lifting decomposition in Figure 8must now connect branches 2 to 1. In summary, after eachlifting decomposition, one must keep track of which branchin this new transform structure (e.g. Figure 8) contains whichbranch from the AODST-3 ( L ) structure (e.g. Figure 7), andconnect the branches of the following lifting decompositionsaccordingly.Now, consider the following observations regarding Step 2.The order of scaling factors and a following lifting structurecan be changed so that the scaling factors follow a liftingstructure with modiﬁed parameters. This change of order doesnot change the input-output relation of this local structure andis discussed in Figure 9. Applying this reordering repeatedlyto all scaling factors in an AODST-3 ( L ) implementation withlifting decompositions results in a new transform structure thathas the same overall input-output relation but all lifting stepsare at the beginning and all scaling factors are at the endof the new overall transform structure. For the AODST-3 (4) implementation in Figure 8, the repeated reordering gives thenew transform structure in Figure 10. It can be veriﬁed thatthis new transform structure in Figure 10 has the same input-output relation as the transform structure in Figure 8 or theAODST-3 (4) in Figure 7.Finally, consider the following observations regarding Step3. The lifting parameters ˜ p m and ˜ u m are quantized forapproximation with rationals of the form k/ l ( k and l areintegers) so that multiplications with them and following rounding operations can be implemented with only integeraddition and bit-shift operations, which is a desirable propertyin video compression. Quantization of lifting parameters alsomeans that the overall transform starts deviating from theAODST-3 ( L ) and the coding gain tends to drop. The choiceof l provides a trade-off between approximation accuracy (i.e.coding gain loss) and implementation complexity. A possibleresult of this step applied to Figure 10 is shown in Figure 11,where l = 3 .Note that such a transform structure where all lifting stepsare at the beginning and all scaling factors are at the endis convenient for obtaining an i2i transform since the liftingsteps at the beginning can provide integer-to-integer mapping(as discussed in Section III), and the scaling factors B i at theend of each branch can be absorbed into the quantization stagein lossy coding, and omitted in lossless coding.In lossy coding, the scaling factors B i can be absorbed intothe quantization stage, and this does not change the codinggain of the overall system [37], [32].In lossless coding, when the scaling factors B i are omitted(i.e. replaced with 1) the coding gain for lossless compressionwith the i2i transform, again, does not change and can beexplained as follows. The denominator of the coding gainexpression in Equation (11) becomes ( (cid:81) Ni =1 σ R,i B − i ) N whenthe scaling factors are omitted, where B i are the aggregatescaling factors at the end of the i th branch. Note that B i areobtained by products of several scaling factors K m,n (where m = 1 , , ..., L represents the plane rotation number and n =1 , indicates either of two scaling coefﬁcients of the decompo-sition) and the product (cid:81) Ni =1 B i = (cid:81) Lm =1 K m, K m, . Since K m, = ± K m, (see Figure 2), the product (cid:81) Ni =1 B i = ± and hence the coding gain does not change when the scalingfactors B i are omitted.In summary, given an AODST-3 ( L ) , all plane rotations arereplaced with one of four types of lifting decompositions intotwo lifting steps and two scaling factors. The scaling factorsfrom all lifting decompositions are pushed to the end of eachbranch using the equality in Figure 9. The lifting parametersare quantized to rationals of the form k/ l so that integerarithmetic can be used for the computations. The cascade oflifting steps at the beginning of the structure provides an i2itransform, and the scaling factors at the end of each branch can + r[0]r[1]r[2]r[3] R[1]R[3]R[2]R[0] ~ u =− sin α cos α ~ p = α ~ p = tan α sin α ~ u = sin α cos α − sin α + ~ u = sin α cos α cos α ~ p =− cos α tan α ++ + ~ p =− tan α sin α sin α ~ u = sin α cos α sin α sin α +++ K = α K = sin α K = cos α K = α K =− sin α K = α K = cos α K = − α Fig. 10. The transform structure obtained after all scaling factors K m, and K m, in Figure 8 are commuted with following lifting structures so that allscaling factors are pushed to the end of each signal branch. Note that this new transform structure has the same input-output relation as the transform structurein Figure 8 or the AODST-3 (4) in Figure 7. + r[0]r[1]r[2]r[3] R[1]R[3]R[2]R[0] B =− cos α sin α B = cos α sin α + B =− sin α cos α B = sin α cos α ++ ^ p = ^ u =− + + + + ^ p = ^ u =− ^ p =− ^ u = ^ p =− ^ u = Fig. 11. An i2i approximation of the odd type-3 DST (ODST-3) consistingof four cascaded lifting structures with quantized lifting parameters ˆ p and ˆ u . The following scaling factors B i can be absorbed into the quantizationstage in lossy coding, and omitted in lossless coding. The quantized liftingparameters ˆ p m and ˆ u m are rationals of the form k/ l ( k and l are integers)so that multiplications with them and following rounding operations can beimplemented with only integer addition and bit-shift operations. be absorbed into the quantization stage in lossy coding, andomitted in lossless coding, which do not change the codinggains in lossy or lossless coding.

3) Choice of lifting decomposition type for each planerotation:

One issue that was not addressed yet in our i2itransform design approach is which one of the four types oflifting decompositions should be used to replace each planerotation in a given AODST-3 ( L ) . Although all four types oflifting decompositions have the same input-output relation asthe plane rotation, there are two reasons why one type ofdecomposition may be preferred over the others to replacea particular plane rotation.The ﬁrst reason comes from the quantization of the liftingparameters ˜ p m and ˜ u m in Step 3 of our design approach.When the lifting parameters ˜ p m and ˜ u m are quantized, theneach different type of decomposition may incur differentquantization error for a particular plane rotation with angle α m and can affect the overall transform differently.The second reason comes from the obtained scaling factors B i at the end of each branch in Step 3 of our design approach.When scaling factors B i are omitted in lossless coding, theobtained i2i transform is a scaled AODST-3 ( L ) because withthe scaling factors B i , it has the same input-output relation asthe AODST-3 ( L ) . Then the obtained i2i transform by omittingthe scaling factors B i is simply equal to the AODST-3 ( L ) with its i th analysis basis function multiplied by B − i . Whilethis scaling does not change the theoretical coding gain for lossless compression as discussed in sub-section IV-D2, itcan have a signiﬁcant impact on compression performanceif the entropy coder is not aware of this scaling. In ourexperiments, we use the reference software of HEVC and itsstandard entropy coder that is designed for the statistics of theorthogonal DCT or ODST-3. In our compression experiments,we have observed that the best compression performance wereachieved by i2i approximations of the ODST-3 where allscaling factors B i were close to ±

1, i.e. the scaling was assmall as possible so that the i2i transform was as close aspossible to being orthogonal. For example, the scaling factors B i in Figure 11 are equal to − . , − . , . and . , respectively. This i2i ODST-3 is one of the ”leastscaled” i2i approximations of AODST-3 (4) and provides thebest compression performance results in our experiments.In summary, the two reasons why the choice of lifting de-composition type is important when replacing plane rotationsare the quantization of the lifting parameters ˜ p m and ˜ u m , andthe omission of the scaling factors B i in lossless coding thatcauses a scaled i2i approximation of AODST-3 ( L ) . One simpleapproach to choose the type of lifting decomposition for eachplane rotation in a given AODST-3 ( L ) is to go through allpossible combinations of decomposition types for all planerotations in the AODST-3 ( L ) (i.e. there will be a total of L combinations), apply Steps 2 and 3 in the design approach,and choose the combination of types that provides the bestcoding gain and also the ”least scaled” i2i transforms, i.e.scaling factors B i close to 1. This is how we chose the typesof decompositions in Figure 8 and we used the resulting i2iapproximation of the ODST-3 in Figure 11 in our experimentalresults.

4) Coding gains for lossless compression:

We now providecoding gains for lossless compression with the i2i approx-imations of the 4-point ODST-3 we designed based on theapproach discussed so far. In particular, the lifting decom-position based representation of the AODST-3 (4) in Figure8 and its equivalent form in Figure 10 provide one of thebest transform structures for i2i approximation of the 4-pointODST-3. Using this transform structure, we provide codinggains for lossless compression based on Equation (11) underdifferent quantization levels of the lifting parameters ˜ p m and ˜ u m . We quantize the lifting parameters to rationals of the form k/ l ( k and l are integers) and provide the obtained coding TABLE IIT

HEORETICAL CODING GAINS ( IN D B, RELATIVE TO THAT OF THE

KLT)

FOR LOSSLESS COMPRESSION WITH I I APPROXIMATIONS OF POINT

ODST-3

WITH VARYING LEVELS OF QUANTIZATION OF LIFTINGPARAMETERS , ALL APPLIED TO THE BLOCK - BASED SPATIAL PREDICTIONRESIDUAL WITH BLOCK SIZE OF N = 4 AND CORRELATION PARAMETER ρ = 0 . . l G -0.0059 -0.0060 -0.0056 -0.0104 -0.0165 -0.0158 -0.0973 -1.0565 gains for various values of l . The results are given in Table II.Table II provides the coding gains relative to the codinggain of the KLT. Note that as the quantization step sizebecomes arbitrarily small, i.e. l grows arbitrarily large, theobtained i2i transform approaches the AODST-3 (4) . Hence,the i2i transform with large l ( l = 8 ) has the same codinggain loss ( . dB ) as the AODST-3 (4) in Table I. As l isreduced, the coding gain losses increase in general since theobtained i2i transforms deviate more signiﬁcantly from theAODST-3 (4) . The coding gain loss for l ≥ is not signiﬁcantin practice. For l = 2 , the coding gain drop is about . dBand this can be important in practice. Thus, the quantizationlevel l = 3 seems to be a good trade-off between coding gainloss and complexity of the i2i transform and we choose this l value, for which the quantized lifting parameters are shownin Figure 11, for our compression experiments within HEVCin Section V.Note that the coding gain for lossless compression withthe RDPCM approach, discussed in Section II-A, can alsobe calculated from Equation (11). The coding gain of simpleDPCM, applied to the (one-dimensional) block-based spatialprediction residual with a block size of N = 4 and correlationparameter ρ = 0 . , is only . dB lower than that ofthe KLT. This coding gain loss is slightly better than that( . dB) of the i2i transform with l = 3 that we use in ourexperiments. However, note that the simple DPCM methodis used only along the horizontal or vertical direction in thehorizontal and vertical intra modes in HEVC or H.264 and isnot used in the other angular intra modes. This is becausein the other angular intra modes, the residual exhibits 2Ddirectional correlation and a corresponding directional DPCM,designed separately for each angular intra mode, is requiredto account for the directional 2D correlation. However, theadditional compression beneﬁts do not justify the additionalcomplexity increase and HEVC or H.264 do not use suchdirectional DPCM methods. On the other hand, a 2D i2itransform based on the designed i2i approximations of theODST-3 can be used for all intra modes and does not need tobe redesigned or optimized for every intra mode. E. I2i transforms for block-based spatial prediction residualswith large block sizes

The approach we used in Section IV-C to obtain approxi-mations of the odd type-3 DST (ODST-3) by cascading planerotations works well for small block sizes, such as N = 4 ,but becomes computationally unmanageable for block sizesequal to or larger than N = 8 . Hence, a different approach isrequired to approximate the ODST-3 for large block sizes. TABLE IIIT

HEORETICAL CODING GAINS ( IN D B) OF ODST-3, EDST-3

AND

DCT

RELATIVE TO THAT OP THE

KLT,

ALL APPLIED TO THE BLOCK - BASEDSPATIAL PREDICTION RESIDUAL WITH CORRELATION PARAMETER ρ = 0 . AND VARYING BLOCK SIZES .Block size 4 8 16 32ODST-3 -0.0009 -0.0024 -0.0045 -0.0072EDST-3 -0.2174 -0.1376 -0.0797 -0.0468DCT -0.6211 -0.5611 -0.4108 -0.2640

One possible approach is to use the even type-3 DST(EDST-3) [31], which can be factored into a cascade of planerotations [36], as an approximation for the ODST-3 [33]. Hanet al. use the EDST-3 in lossy coding within the VP9 codec totransform the block-based spatial prediction residuals of 8x8blocks and report compression results very close to those withthe ODST-3 [33].The basis functions of the EDST-3 are given by [ E ] m,n = (cid:114) N sin ( (2 m − n − π N ) , m, n ∈ { , ..., N } (15)where m and n are integers representing the frequency andtime index of the basis functions, respectively. When the ﬁrst( m = 1 ) and most important basis function is plotted, onecan see that, similar to the ﬁrst basis function of the ODST-3in Equation (10), it has smaller values at the beginning (i.e.closer to the prediction boundary) and larger values towardsthe end of the block, which implies that the EDST-3 may havegood coding gain, in particular better than the conventionallyused DCT, for block-based spatial prediction residuals.Table III lists the coding gain losses of ODST-3 and EDST-3 with respect to the KLT for a spatial prediction residualblock with correlation coefﬁcient ρ = 0 . at various blocksizes. It can be seen from the table that the coding gain lossof EDST-3 with respect to the ODST-3 becomes smaller asblock size N increases. The coding gain loss of EDST-3 withrespect to ODST-3 is . dB for a block size of N = 4 ,drops to . dB for a block size of N = 8 , and dropsfurther for larger block sizes. The coding gains of the DCTare also shown in Table III for comparison.For large block sizes, the coding gain loss arising fromusing the EDST-3 instead of the ODST-3 can be a goodtrade-off for the reduction in computational complexity, sincethe ODST-3 must be implemented with a general matrixmultiplication (with complexity ∝ N ) while the EDST-3can be implemented with a cascade of plane rotations withcomplexity ∝ N log N .The coding gains in Tables I and III indicate that theAODST-3 (4) derived in Section IV-C has better coding gainthan the EDST-3 for a block size of N = 4 . In particular,their coding gain losses with respect to that of the KLT are . and . dB, respectively. For larger block sizes,the approach we used in Section IV-C to obtain the AODST-3 (4) becomes computationally unmanageable and we use theEDST-3 to approximate the ODST-3.As block size increases, the EDST-3 becomes a betterapproximation of the ODST-3, i.e. the coding gain loss arisingfrom using the EDST-3 instead of the ODST-3 reduces. For example, for N = 8 and N = 16 , the coding gain losses dropto . and . dB, respectively. While these coding gainlosses may still be considered signiﬁcant in some contexts,they become insigniﬁcant in HEVC, which we use for ourexperimental results, because these block sizes larger than N = 4 are used rarely in lossless compression in HEVC.The block sizes available in HEVC for intra predictionand transform range from 4x4 to 32x32. However in losslesscompression, large block sizes such as N = 8 or N = 16 (i.e. 8x8 and 16x16 intra prediction blocks) are used muchless frequently than the block size of N = 4 (i.e. 4x4 intraprediction blocks), as we show in the experimental resultsin Section V. This is because the bitrate of the predictionresidual dominates the overall bitrate in lossless compression(i.e. bitrate of side information, such as intra modes, is a verysmall fraction of the overall bitrate), and to reduce the bitrateof the residual, better prediction is needed, which is best atthe smallest available block size, i.e. 4x4 block size.Thus in lossless compression within HEVC (or any othercodec that has 4x4 block intra prediction and transforms),lossless compression efﬁciency of the block size of N = 4 dominates the overall lossless compression efﬁciency of thesystem, and the sub-optimal performance at larger block sizeshas an insigniﬁcant effect on the overall compression results,as we show in Section V. Nevertheless, to demonstrate thisinsigniﬁcant effect, we design an i2i transform based on theEDST-3 for only the block size N = 8 (i.e. 8x8 intra predictionblocks) and provide experimental results with it in Section V.The approach we use to design the 8-point i2i transformbased on the 8-point EDST-3, in particular its representationas a cascade of plane rotations, is the same as presented inSection IV-D. We apply the same 3-step procedure. First, eachplane rotation in the 8-point EDST-3 is replaced with one offour possible decompositions into two lifting steps and twoscaling factors. Next, the scaling factors from all decomposi-tion are pushed to the end of each branch. Finally, multiplescaling factors at the end of each branch are combined intoa single scaling factor per branch, and all lifting parametersare quantized for approximation with rationals of the form k/ l , where we use l = 8 . We also chose the type of liftingdecomposition for each plane rotation so that the overall i2itransform is as close as possible to being orthogonal. Theresulting 8-point i2i transform has a coding gain loss of only . dB relative to the 8-point EDST-3.V. E XPERIMENTAL R ESULTS

The i2i approximation of the 4-point ODST-3 in Figure 11and the i2i approximation of the 8-point ODST-3 discussedin Section IV-E are implemented into the HEVC version 2Range Extensions (RExt) reference software (HM-15.0+RExt-8.1) [38] to provide experimental results of these developed i2itransforms for lossless intra-frame compression. Both of thesei2i transforms are applied along ﬁrst the horizontal and thenthe vertical direction to obtain 4x4 and 8x8 i2i approximationsof the 2D ODST-3 for 4x4 and 8x8 intra prediction residualblocks, respectively. These 2D i2i transforms are used inlossless compression to transform 4x4 and 8x8 block intraprediction residuals of both luma and chroma pictures.

A. Experimental Setup

To evaluate the performance of the developed i2i transforms,the following systems are derived from the reference software and compared in terms of lossless intra-frame compressionperformance and complexity : • HEVCv1 • HEVCv2 • i2iDST4 • i2iDST4+RDPCM • i2iDST4&8 • i2iDST4&8+RDPCM.The employed processing in each of these systems is summa-rized in Table IV and discussed below.The HEVCv1 system represents HEVC version 1, whichjust skips transform and quantization and sends the predictionresidual block without any further processing to the entropycoder, as discussed in Section II.The HEVCv2 system represents HEVC version 2, in whichhorizontal RDPCM is applied in the horizontal intra modeat all available block sizes from 4x4 to 32x32, and verticalRDPCM is applied in the vertical intra mode at all availableblock sizes. For all the other 33 intra modes at all availableblock sizes, the prediction residual is not processed and sentto the entropy coder.The remaining systems employ the developed i2i approxi-mations of the ODST-3. In the i2iDST4 system, the RDPCMsystem of the HEVC reference software is disabled in 4x4 intraprediction blocks and the 4x4 i2i 2D ODST-3 is used in allmodes of 4x4 intra prediction residual blocks. In larger blocks,the default HEVCv2 processing, i.e. RDPCM in horizontal andvertical modes, is used.In the i2iDST4+RDPCM system, the i2i transform andRDPCM methods are combined in 4x4 block intra coding.In other words, in intra coding of 4x4 intra prediction blocks,the RDPCM method of HEVCv2 is used if the intra predictionmode is horizontal or vertical, and the 4x4 i2i 2D ODST-3 isused for other intra prediction modes. In larger blocks, thedefault HEVCv2 processing, i.e. RDPCM in horizontal andvertical modes, is used.In the i2iDST4&8 system, the RDPCM system of the HEVCreference software is disabled in 4x4 and 8x8 intra predictionresidual blocks and the 4x4 and 8x8 i2i 2D ODST-3 are usedin all modes of 4x4 and 8x8 intra prediction residual blocks.In larger residual blocks, such as 16x16 or 32x32 blocks, thedefault HEVCv2 processing, i.e. RDPCM in horizontal andvertical modes, is used.Finally, in the i2iDST4&8+RDPCM system, the i2i trans-form and RDPCM methods are combined in 4x4 and 8x8 blockintra coding. In other words, in intra coding of 4x4 and 8x8intra prediction blocks, the RDPCM method of HEVCv2 isused if the intra prediction mode is horizontal or vertical, andthe 4x4 or 8x8 i2i 2D ODST-3 is used for other intra predictionmodes. In larger blocks, the default HEVCv2 processing, i.e.RDPCM in horizontal and vertical modes, is used. We are planing to share the source code of our modiﬁed reference software,from which all these systems can be obtained, on github.com. TABLE IVP

ROCESSING OF INTRA PREDICTION RESIDUAL BLOCKS PRIOR TO ENTROPY CODING IN EACH SYSTEM

HEVCv1 HEVCv2 i2iDST4 i2iDST4 i2iDST4&8 i2iDST4&8+RDPCM +RDPCM4x4 hor/ver intra - hor/ver rdpcm 4x4 i2i 2D DST hor/ver rdpcm 4x4 i2i 2D DST hor/ver rdpcm4x4 other intra - - 4x4 i2i 2D DST 4x4 i2i 2D DST 4x4 i2i 2D DST 4x4 i2i 2D DST8x8 hor/ver intra - hor/ver rdpcm hor/ver rdpcm hor/ver rdpcm 8x8 i2i 2D DST hor/ver rdpcm8x8 other intra - - - - 8x8 i2i 2D DST 8x8 i2i 2D DSTlarger hor/ver intra - hor/ver rdpcm hor/ver rdpcm hor/ver rdpcm hor/ver rdpcm hor/ver rdpcmlarger other intra - - - - - -TABLE VA

VERAGE PERCENTAGE ( % ) BITRATE REDUCTION ANDENCODING / DECODING TIMES OF SEVERAL SYSTEMS WITH RESPECT TOTHE

HEVC V SYSTEM IN LOSSLESS INTRA CODING FOR A LL -I NTRA -M AIN SETTINGS .HEVCv2 i2iDST4 i2iDST4 i2iDST4&8 i2iDST4&8+RDPCM +RDPCMClass A 7.2 11.7 12.1 12.1 12.6Class B 4.5 6.3 6.5 6.4 6.7Class C 5.3 6.3 7.0 6.3 7.1Class D 7.5 8.4 9.4 8.2 9.5Class E 8.2 9.2 10.5 8.8 10.5Average 6.4 8.3 8.9 8.2 9.1Enc. T. .

6% 99 .

0% 99 .

6% 107 .

2% 103 . Dec. T. .

9% 95 .

3% 98 .

0% 95 .

8% 97 . Table IV summarizes the processing in all systems. Inall systems, except HEVCv1 system, available RExt tools,such as a dedicated context model for the signiﬁcance map,Golomb rice parameter adaptation, intra reference smoothingand residual rotation [16], [25], are used. However, the residualrotation RExt tool is not used with i2i transforms since i2itransforms already compact the residual energy into the lowerfrequency transform coefﬁcients.

B. Lossless Intra-frame Compression Results

For the experimental results, the common test conditions in[39] are followed, except that only the ﬁrst 150 frames arecoded from every sequence due to our limited computationalresources. The results are shown in Table V, which include av-erage percentage ( % ) bitrate reductions and encoding/decodingtimes of all systems with respect to HEVCv1 system for All-Intra-Main encoding settings [39].Consider ﬁrst the results of the HEVCv2, i2iDST4 andi2iDST4+RDPCM systems in Table V. Their average (aver-aged over all sequences in all classes) bitrate savings withrespect to HEVCv1 system are . , . and . , re-spectively. Notice also from the results in the table that thesystems employing the developed 4-point i2i ODST-3, i.e.i2iDST4 and i2iDST4+RDPCM, achieve consistently largerbitrate reductions than HEVCv2 in all classes.Note also that the i2iDST4+RDPCM system performs betterthan the i2iDST4 system in all classes. In other words, the re-sults indicate that if RDPCM is used for horizontal and verticalintra modes, and i2i 2D ODST-3 for other intra modes, as inthe i2iDST4+RDPCM system, the best lossless compressionperformance is achieved. This is because the residual in thehorizontal and vertical intra modes can be modeled well withseparable 2D correlation (with much larger correlation along the prediction direction than the perpendicular direction) [40]and thus the simple horizontal or vertical DPCM is a greatﬁt and can achieve very good compression performance inthese modes, as indicated by its good theoretical coding gaindiscussed in sub-section IV-D4. In the remaining intra modes,the horizontal or vertical RDPCM method would not workwell (see sub-section IV-D4) but the designed i2i ODST-3 canprovide good compression gains.Consider now also the results of the i2iDST4&8 andi2iDST4&8+RDPCM systems in Table V. As shown in TableIV, these systems use an i2i approximation of ODTS-3 alsoin 8x8 blocks, in addition to that in 4x4 blocks. The bitratesavings achieved by these systems, however, do not providesigniﬁcant or consistent increases on top of those provided bythe i2iDST4 and i2iDST4+RDPCM systems. In other words,using i2i ODST-3 in 4x4 blocks seems to provide most of theachievable compression gain and using also an i2i approxi-mation of ODST-3 in 8x8 blocks does not seem to providesigniﬁcant increase in lossless compression performance. Thisis reminiscent of the similar situation in lossy coding, whereHEVC uses the ODST-3 in only 4x4 intra blocks, and theadditional compression gains from using the ODST-3 (insteadof the conventional DCT) in lossy coding of larger intra blocksis small and does not justify the additional computationalcomplexity burden of the ODST-3 over the DCT [41].Finally, consider also the average encoding and decodingtimes of all systems in Table V. They are compared tothose of HEVCv1, assuming HEVCv1 system spends 100%time on encoding and decoding. The HEVCv2, i2iDST4,and i2iDST4+RDPCM systems achieve lower encoding anddecoding times than HEVCv1, despite their additional pro-cessing of the residuals, mainly due to their lower bitrateswhich allow the complex entropy coding/decoding to ﬁnishfaster. The i2iDST4&8 and i2iDST4&8+RDPCM systemshave longer encoding times than HEVCv1 since the 8-pointi2i approximation of ODST-3 requires more computation thanthe 4-point i2i ODST-3 or the DPCM method, however, thedecoding times are shorter than those of HEVCv1 since the8-point inverse i2i ODST-3 is rarely used at the decoder, aswe discuss in section V-C.The results of the i2iDST4&8 and i2iDST4&8+RDPCMsystems in Table V indicate that when i2i ODST-3 is usedin 4x4 blocks, using i2i approximation of ODST-3 in also8x8 blocks does not improve lossless intra frame compressionsigniﬁcantly or consistently. To analyze this result further, weperform a new set of experiments. We disable the use of 4x4intra prediction blocks in all systems so that the smallest block TABLE VIA

VERAGE PERCENTAGE ( % ) BITRATE REDUCTION OF SEVERAL SYSTEMSWITH MINIMUM ALLOWED BLOCK SIZE OF X FOR INTRA PREDICTIONWITH RESPECT TO

HEVC V THAT HAS MINIMUM ALLOWED BLOCK SIZEOF X size for intra prediction is 8x8, which allows us to investigatehow the systems compare when only the 8-point i2i ODST-3is available. In other words, in this new set of experiments, theprocessing of intra prediction residual blocks prior to entropycoding is the same as in Table IV, except that the top two rowswith 4x4 block processing are not allowed in all systems. (Notethat in this case the i2iDST4 and i2iDST4+RDPCM systemsbecome identical to HEVCv2.) The compression results arepresented in Table VI.Note that the results in Table VI are bitrate savings withrespect to HEVCv1 in the initial set of experiments, i.e.HEVCv1 that has access to all block sizes from 4x4 to 32x32,so that these results can also be easily compared to those inTable V. The results in Table VI indicate that in this new set ofexperiments, systems employing i2i approximations of ODST-3 achieve similar compression gains with respect to HEVCv2.In particular, the i2iDST4&8+RDPCM system achieves anaverage bitrate reduction of . and . with respect toHEVCv2 in Tables V and VI, respectively. In summary, fromthe results in Tables V and VI, it can be concluded thatthe developed 8-point i2i approximation of the ODST-3 canachieve signiﬁcant compression gains if it is the smallest pointi2i ODST-3 used in the system, but its contribution to theoverall compression performance becomes insigniﬁcant if it isused together with the 4-point i2i ODST-3. C. Block size and intra mode statistics

It is also useful to obtain additional insights by lookingat the statistics regarding how often available block sizesand intra modes are used in lossless compression within thesystems we compare in this section. For this purpose, weuse the initial experiments where all block sizes from 4x4to 32x32 are available for intra prediction in all systems.Figure 12 shows the percentage of pixels that are coded ineach available block size in the systems HEVCv1, HEVCv2,i2iDST4+RDPCM and i2iDST4&8+RDPCM for all classesand Table VII summarizes the average of these statistics(averaged over all sequences in all classes.)The most important observation from the percentages inFigure 12 is that 4x4 block size is by far the most frequentlyused block size in all of the systems for all sequence classes.This is because the bitrate of the prediction residual dominatesthe overall bitrate in lossless compression (i.e. bitrate of sideinformation, such as intra modes, is a very small fraction of theoverall bitrate), and to reduce the bitrate of the residual, better prediction is needed, which is typically best at the smallestavailable block size, i.e. 4x4 block size.A closer look at the percentages in Table VII shows thatwhile the percentages of pixels coded in 4x4 and 8x8 blocksare . and . in HEVCv1, respectively, they change to . and . in HEVCv2. This is because HEVCv1 doesnot process the block-based spatial prediction residual and theencoder chooses 4x4 blocks almost exclusively, except in veryﬂat regions where prediction with 4x4 or 8x8 blocks is almostidentical. In HEVCv2, the prediction residual is processedwith RDPCM in horizontal and vertical intra modes, whichimproves the prediction performance in larger (and smaller)blocks and thus larger blocks are used more often in HEVCv2.Let us also observe what these percentages are in thesystems utilizing i2i approximations of the ODST-3. In thei2iDST4+RDPCM system, the percentage of pixels coded in4x4 blocks increases back to . while the percentage ofpixels coded in 8x8 blocks decreases to . . This changeis due to the i2i ODST-3 in 4x4 blocks, which improvesthe lossless compression performance in 4x4 blocks andthus the encoder chooses 4x4 blocks more often. In thei2iDST4&8+RDPCM system, the percentage of pixels codedin 4x4 blocks decreases back to . , while the percentageof pixels coded in 8x8 blocks increases back to . . Thischange is due to using i2i ODST-3 also in 8x8 blocks, whichcan slightly improve lossless compression performance in 8x8blocks and thus the encoder chooses 8x8 blocks more often.Table VIII shows how the percentages of the 4x4 and 8x8blocks in Table VII are distributed to the 35 intra modes. Inparticular, we consider the aggregate of the horizontal andvertical intra modes and the the aggregate of the remainingintra modes. Table VIII shows that in HEVCv1, . ofpixels are coded in 4x4 block horizontal and vertical intramodes and . in the remaining modes, and these num-bers change to . and . in HEVCv2. This changehappens because HEVCv2 uses RDPCM in horizontal andvertical modes and RDPCM improves compression perfor-mance, causing the encoder to choose the modes more often.In the i2iDST4+RDPCM system, i2i ODST-3 is used in TABLE VIIP

ERCENTAGE OF PIXELS THAT ARE CODED IN EACH AVAILABLE BLOCKSIZE IN SEVERAL SYSTEMS ( AVERAGE OVER ALL SEQUENCES )Block size HEVCv1 HEVCv2 i2iDST4 i2iDST4&8+RDPCM +RDPCM4x4 89.6 77.2 88.1 79.48x8 8.6 17.7 8.7 17.416x16 1.8 5.1 3.2 3.232x32 0 0 0 0TABLE VIIID

ISTRIBUTION OF PERCENTAGES OF THE X AND X BLOCKS IN T ABLE

VII

TO HORIZONTAL & VERTICAL AND REMAINING INTRA MODES

Intra modes HEVCv1 HEVCv2 i2iDST4 i2iDST4&8+RDPCM +RDPCM4x4 hor&ver 15.0 41.2 25.9 23.94x4 other 74.6 36.0 62.2 55.58x8 hor&ver 0.6 12.0 4.5 5.18x8 other 8.0 5.7 4.2 12.3 H E V C v1 H E V C v2 i i D S T + R D P C M i i D S T & + R D P C M H E V C v1 H E V C v2 i i D S T + R D P C M i i D S T & + R D P C M H E V C v1 H E V C v2 i i D S T + R D P C M i i D S T & + R D P C M H E V C v1 H E V C v2 i i D S T + R D P C M i i D S T & + R D P C M H E V C v1 H E V C v2 i i D S T + R D P C M i i D S T & + R D P C M H E V C v1 H E V C v2 i i D S T + R D P C M i i D S T & + R D P C M Class A Class B Class C Class D Class E Average P e r ce n t a g e ( % ) Fig. 12. Percentage of pixels that are coded in each available block size in several systems for all sequence classes. the 4x4 block other modes (i.e. not horizontal or vertical)and this increases the percentage of these modes to . since the i2i ODST-3 improves compression performance andthus the encoder chooses these modes more often. In thei2iDST4&8+RDPCM system, i2i approximation of ODST-3 isalso used in the other modes of 8x8 blocks and this increasesthe percentage of the 8x8 other modes to . comparedto the . in the i2iDST4+RDPCM or the . in theHEVCv2 systems, which do not process these intra modesprior to entropy coding.VI. C ONCLUSIONS

This paper explored an alternative approach for losslessintra-frame compression. A popular and computationally ef-ﬁcient approach, used also in H.264 and HEVC, is to skiptransform and quantization but also process the residual blockwith DPCM, along he horizontal or vertical direction, priorto entropy coding. This paper explored an alternative ap-proach based on processing the residual block with integer-to-integer (i2i) transforms. In particular, we developed noveli2i approximations of the odd type-3 DST (ODST-3) that canbe applied to the residuals of all intra prediction modes inlossless intra-frame compression. Experimental results withthe HEVC reference software showed that the developedi2i approximations of the ODST-3 improve lossless intra-frame compression efﬁciency with respect to HEVC version2, which uses the popular DPCM method along the horizontalor vertical direction, by an average 2.7% without a signiﬁcanteffect on computational complexity.A

CKNOWLEDGMENT

We thank Vivek K Goyal for his valuable comments.R

EFERENCES[1] G. Sullivan, J. Ohm, W.-J. Han, and T. Wiegand, “Overview of theHigh Efﬁciency Video coding (HEVC) standard,”

Circuits and Systemsfor Video Technology, IEEE Transactions on , vol. 22, no. 12, pp. 1649–1668, Dec 2012. [2] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview ofthe H.264/AVC video coding standard,”

Circuits and Systems for VideoTechnology, IEEE Transactions on , vol. 13, no. 7, pp. 560–576, July2003.[3] Y.-L. Lee, K.-H. Han, and G. Sullivan, “Improved lossless intra codingfor H.264/MPEG-4 AVC,”

Image Processing, IEEE Transactions on ,vol. 15, no. 9, pp. 2610–2615, Sept 2006.[4] S.-W. Hong, J. H. Kwak, and Y.-L. Lee, “Cross residual transform forlossless intra-coding for HEVC,”

Signal Processing: Image Communi-cation , vol. 28, no. 10, pp. 1335 – 1341, 2013.[5] G. Jeon, K. Kim, and J. Jeong, “Improved residual DPCM for HEVClossless coding,” in

Graphics, Patterns and Images (SIBGRAPI), 201427th SIBGRAPI Conference on , Aug 2014, pp. 95–102.[6] X. Cai and J. S. Lim, “Adaptive residual DPCM for lossless intracoding,” in

IS&T/SPIE Electronic Imaging . International Society forOptics and Photonics, 2015, pp. 94 100A–94 100A.[7] V. K. Goyal, “Transform coding with integer-to-integer transforms,”

IEEE Transactions on Information Theory , vol. 46, no. 2, pp. 465–473,Mar 2000.[8] M. Budagavi, A. Fuldseth, G. Bjontegaard, V. Sze, and M. Sadafale,“Core transform design in the high efﬁciency video coding (hevc)standard,”

Selected Topics in Signal Processing, IEEE Journal of , vol. 7,no. 6, pp. 1029–1041, Dec 2013.[9] J. Liang and T. Tran, “Fast multiplierless approximations of the dct withthe lifting scheme,”

Signal Processing, IEEE Transactions on , vol. 49,no. 12, pp. 3032–3044, Dec 2001.[10] F. Kamisli, “Lossless compression in hevc with integer-to-integer trans-forms,” in

Multimedia Signal Processing (MMSP), 2016 IEEE 18thInternational Workshop on . IEEE, 2016, pp. 1–6.[11] C. Yeo, Y. H. Tan, Z. Li, and S. Rahardja, “Mode-dependent transformsfor coding directional intra prediction residuals,”

Circuits and Systemsfor Video Technology, IEEE Transactions on , vol. 22, no. 4, pp. 545–554,April 2012.[12] J. Han, A. Saxena, V. Melkote, and K. Rose, “Jointly optimized spatialprediction and block transform for video and image coding,”

IEEETransactions on Image Processing , vol. 21, no. 4, pp. 1874–1884, 2012.[13] C. Ying and P. Hao, “Integer reversible transformation to make jpeglossless,” in

Signal Processing, 2004. Proceedings. ICSP ’04. 2004 7thInternational Conference on , vol. 1, Aug 2004, pp. 835–838 vol.1.[14] W. Philips, “The lossless dct for combined lossy/lossless image coding,”in

Image Processing, 1998. ICIP 98. Proceedings. 1998 InternationalConference on , Oct 1998, pp. 871–875 vol.3.[15] F. Kamisli, “Lossless intra coding in hevc with integer-to-integer dst,” in

Signal Processing Conference (EUSIPCO), 2016 24th European . IEEE,2016, pp. 2440–2444.[16] D. Flynn, D. Marpe, M. Naccari, T. Nguyen, C. Rosewarne, K. Sharman,J. Sole, and J. Xu, “Overview of the range extensions for the hevcstandard: Tools, proﬁles, and performance,”

Circuits and Systems for Video Technology, IEEE Transactions on , vol. 26, no. 1, pp. 4–19, Jan2016.[17] S. Lee, I. Kim, and K. C, “Ahg7: Residual dpcm for hevc loss- lesscoding,”

JCTVC-L0117, Geneva, Switzerland , pp. 1–10, January 2013.[18] M. Zhou, W. Gao, M. Jiang, and H. Yu, “HEVC lossless codingand improvements,”

Circuits and Systems for Video Technology, IEEETransactions on , vol. 22, no. 12, pp. 1839–1843, Dec 2012.[19] K. Kim, G. Jeon, and J. Jeong, “Piecewise DC prediction in HEVC,”

Signal Processing: Image Communication , vol. 29, no. 9, pp. 945 – 950,2014.[20] E. Wige, G. Yammine, P. Amon, A. Hutter, and A. Kaup, “Pixel-basedaveraging predictor for HEVC lossless coding,” in

Image Processing(ICIP), 2013 20th IEEE International Conference on , Sept 2013, pp.1806–1810.[21] S. R. Alvar and F. Kamisli, “On lossless intra coding in HEVC with3-tap ﬁlters,”

Signal Processing: Image Communication

Signal Processing: Image Commu-nication , vol. 25, no. 9, pp. 687–696, 2010.[23] S. Kim and A. Segall, “Simpliﬁed CABAC for lossless compression,”

JCTVC-H0499, San Jos´e, CA, USA , pp. 1–10, 2012.[24] J.-A. Choi and Y.-S. Ho, “Efﬁcient residual data coding in CABAC forHEVC lossless video compression,”

Signal, Image and Video Process-ing , vol. 9, no. 5, pp. 1055–1066, 2015.[25] J. Sole, R. Joshi, and K. M, “Rce2 test b.1: Residue rotation andsigniﬁcance map context,”

JCTVC-N0044, Vienna, Austria , pp. 1–10,July 2013.[26] W. Sweldens, “The lifting scheme: A custom-design construction ofbiorthogonal wavelets,”

Applied and computational harmonic analysis ,vol. 3, no. 2, pp. 186–200, 1996.[27] W.-H. Chen, C. Smith, and S. Fralick, “A fast computational algorithmfor the discrete cosine transform,”

Communications, IEEE Transactionson , vol. 25, no. 9, pp. 1004–1009, Sep 1977.[28] C. Loefﬂer, A. Ligtenberg, and G. S. Moschytz, “Practical fast 1-d dctalgorithms with 11 multiplications,” in

Acoustics, Speech, and SignalProcessing, 1989. ICASSP-89., 1989 International Conference on , May1989, pp. 988–991 vol.2.[29] J. Lainema, F. Bossen, W.-J. Han, J. Min, and K. Ugur, “Intra coding ofthe HEVC standard,”

Circuits and Systems for Video Technology, IEEETransactions on , vol. 22, no. 12, pp. 1792–1801, Dec 2012.[30] M. Flickner and N. Ahmed, “A derivation for the discrete cosinetransform,”

Proceedings of the IEEE , vol. 70, no. 9, pp. 1132–1134,1982.[31] A. K. Jain, “A sinusoidal family of unitary transforms,”

IEEE Transac-tions on Pattern Analysis and Machine Intelligence , no. 4, pp. 356–365,1979.[32] V. K. Goyal, “Theoretical foundations of transform coding,”

IEEE SignalProcessing Magazine , vol. 18, no. 5, pp. 9–21, 2001.[33] J. Han, Y. Xu, and D. Mukherjee, “A butterﬂy structured design of thehybrid transform coding scheme,” in

Picture Coding Symposium (PCS),2013 . IEEE, 2013, pp. 17–20.[34] J. S. Lim,

Two-dimensional Signal and Image Processing . PrenticeHall, 1990.[35] H. Chen and B. Zeng, “New transforms tightly bounded by dct and klt,”

Signal Processing Letters, IEEE , vol. 19, no. 6, pp. 344–347, June 2012.[36] Z. Wang, “Fast algorithms for the discrete w transform and for thediscrete fourier transform,”

IEEE Transactions on Acoustics, Speech,and Signal Processing , vol. 32, no. 4, pp. 803–816, 1984.[37] H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Low-complexity transform and quantization in H.264/AVC,”

IEEE Transac-tions on circuits and systems for video technology , vol. 13, no. 7, pp.598–603, 2003.[38] “HM reference software (hm-15.0+rext-8.1),” https://hevc.hhi.fraunhofer.de/trac/hevc/browser/tags/HM-15.0+RExt-8.1, accessed:2016-01-01.[39] F. Bossen, “Common test conditions and software reference conﬁgura-tions,”

Joint Collaborative Team on Video Coding (JCT-VC), JCTVC-F900 , 2011.[40] F. Kamisli, “Intra prediction based on Markov process modeling ofimages,”

Image Processing, IEEE Transactions on , vol. 22, no. 10, pp.3916–3925, Oct 2013.[41] A. Saxena and F. C. Fernandes, “Mode dependent dct/dst for intraprediction in block-based image/video coding,” in

Image Processing(ICIP), 2011 18th IEEE International Conference on . IEEE, 2011,pp. 1685–1688.