NNew Transforms for JPEG Format
Stanislav Svoboda, David Barina ∗ Faculty of Information TechnologyBrno University of TechnologyBozetechova 1 / Abstract
The two-dimensional discrete cosine transform (DCT) can be found in the heart of many image compression algorithms. Specifically,the JPEG format uses a lossy form of compression based on that transform. Since the standardization of the JPEG, many othertransforms become practical in lossy data compression. This article aims to analyze the use of these transforms as the DCTreplacement in the JPEG compression chain. Each transform is examined for di ff erent image datasets and subsequently comparedto other transforms using the peak signal-to-noise ratio (PSNR). Our experiments show that an overlapping variation of the DCT,the local cosine transform (LCT), overcame the original block-wise transform at low bitrates. At high bitrates, the discrete wavelettransform employing the Cohen–Daubechies–Feauveau 9 / ff ers about the same compression performance as the DCT. Keywords:
JPEG, lossy image compression, transform coding, discrete cosine transform, discrete wavelet transform
1. Introduction
In last decades, needs for high-quality photography are grow-ing, and so demands for e ffi cient data storage are also growing.Therefore, it is important to compress the data as much as pos-sible while preserving the quality of the image. For example,transferring a large number of images with high resolution acrossthe Internet without a certain level of compression would be verytime-consuming. Regarding the photography, the problem canbe addressed by lossy image compression. Nowadays, the JPEGstandard [1], dating back to 1991, is still the most widely usedformat for the lossy compression. Figure 1 shows the underlyingcompression chain. Each color component is transformed byblocks 8 × × ffi cients. This allows for better image compression.The transform coe ffi cients are further quantized and fed into anentropy coder.Since then, several other lossy image compression standardshave been standardized. However, none of them became morepopular with the public than the original JPEG. Particularly,the JPEG 2000 [2] decomposes large image tiles using the dis-crete wavelet transform (DWT). The advantage of wavelets isthat wavelets are located on the small area in the image do-main. Another interesting standard is JPEG XR [3], which isbased on an overlapping hierarchical transform, so-called lappedbiorthogonal transform (LBT). The last of the standards to be ∗ Corresponding author
Email addresses: [email protected] (Stanislav Svoboda), [email protected] (David Barina) mentioned is WebP [4], based on the DCT complemented byWalsh–Hadamard transform (WHT).Figure 1 shows that the JPEG block-wise scheme is verygeneral. This opens the way to incorporate some other suit-able transforms into the same compression chain. This is themotivation behind our research.The rest of the paper is organized as follows. Section 2presents the JPEG chain in the necessary level of detail. Subse-quent Section 3 deals with the transforms suitable for involve-ment in this chain and examines their compression capabilities.Eventually, Section 4 summarizes and closes the paper.
2. JPEG Format
Part 1 of JPEG standard [1] specifies the method of lossycompression for digital images, based on the discrete cosinetransform (DCT). This section describes a simplified descriptionof JPEG image compression.The color model to be used is YC B C R . Therefore, the repre-sentation of the colors in the image is first converted from RGBto YC B C R . The transformation into the YC B C R model enablesthe next usual step, which is to reduce the spatial resolution ofthe C B and C R components. For the rest of the compressionprocess, Y, C B , and C R components are processed separately.As a next step, each component is split into blocks of 8 × x k , l for (0 , ≤ ( k , l ) < (8 , X m , n = λ m , n (cid:88) k , l cos (cid:32) π ( k + / mN (cid:33) cos (cid:32) π ( l + / nN (cid:33) x k , l , (1) Preprint submitted to SCCG 2017 May 11, 2017 a r X i v : . [ c s . MM ] M a y CT quantizationquantization Hu ff manRLE Hu ff maninput compressedtable tables bitstreamcodercomponent Figure 1: JPEG overview. The dashed line indicates the compression chain. where (0 , ≤ ( m , n ) < (8 , λ m , n is a scaling factor, and N = ffi cients are quantized. Whenperforming a block-based transform and quantization, severaltypes of artifacts can appear, especially blocking artifacts. Theblocking artifacts are shown in Figure 2. The artifacts can bereduced by choosing a finer quantization, which corresponds toa lower level of compression. Figure 2: Blocking artifacts caused by the JPEG compression.
The DCT itself is a lossless process since the original inputcan be exactly reconstructed by applying an inverse transformto the coe ffi cients X m , n directly. In order to achieve substantialcompression ratio, quantization is applied to reduce the levels ofthe coe ffi cients. The uniform quantization procedure is used toquantize the coe ffi cients. One of up to four quantization tables Q m , n may be used in the quantization procedure. No defaultquantization table is specified in the standard. The quantizationis formulated as ˆ X m , n = round (cid:32) X m , n Q m , n (cid:33) , (2)where the round( a ) operator rounds value a to the nearest integer.Human visual system is more immune to a loss of higher spatialfrequency components than a loss of lower frequency compo-nents. This allows quantization to greatly reduce the amount ofinformation in the high-frequency components.After quantization, the ˆ X m , n coe ffi cients are fed into an en-tropy coder. Entropy coding employed in the JPEG is a specialform of lossless compression. The ˆ X , coe ffi cient (DC coe ffi -cient) is treated di ff erently than other coe ffi cients (AC coe ffi -cients). The latter ones are converted into a one-dimensional ”zig-zag” sequence. The rest of the process involves run-lengthencoding (RLE) of zeros and then using Hu ff man coding (arith-metic coding is possible, however rarely used).From the above, it is clear that the scheme is almost indepen-dent of the transform used. Consequently, it would seem logicalto substitute the DCT with some other similar transform. Severalother papers on this topic have already been published. Someof them are briefly reviewed below. The authors of [5] exam-ined the possibility of using the discrete Chebyshev transform(DChT) in JPEG. As reported in their paper, the DChT over-comes DCT on images with sharp edges and high predictability.In [6], the author compared the compression performance ofthe block-wise DCT against several lapped transforms. He con-cluded that lapped transforms have less blocking than the DCTand show some PSNR improvement over the DCT.Considering the existing papers, we see that a wider compar-ison of the transforms in the JPEG compression chain is missing.The next section investigates the performance of some promisingtransforms in conjunction with the JPEG compression.
3. New Transforms for JPEG Format
This section interleaves a description of the transforms andtheir evaluation. The evaluation was performed on two datasets[7, 8]. At the beginning, trigonometric transforms are inves-tigated. Subsequently, separable and non-separable wavelet,Chebyshev, and Walsh–Hadamard transforms are examined.
The discrete sine transform (DST) is very similar to the DCT,except cosines are replaced with sines. Recall that the DCT hasthe property that, for a typical image, most of the informationis concentrated in just a few coe ffi cients X m , n with the lowest( m , n ) indices. However, this property is not always valid forsine transforms. We found one variant for which the propertyholds. In the literature, this variant is referred to as the DST-VII[9] variant. Since most of the transforms investigated in thispaper are separable, only the one-dimensional definitions aregiven from now on. The DST is defined by X m = λ m (cid:88) k sin (cid:32) π ( k + m + / N + / (cid:33) x k , (3)where λ m is a scaling factor.2 P S N R [ d B ] BPP DCT DST DHT LDCT
Figure 3: Comparison of the DCT, DST, DHT, and LCT. The LCT overcomesthe other transforms at low bitrates.
Like the previous transforms, the discrete Hartley transform(DHT) [10] is also based on trigonometric functions. In fact, itsdefinition looks very similar to the definition of discrete Fouriertransform (DFT). Unlike the DFT, the discrete Hartley trans-form maps real inputs onto real outputs, with no involvement ofcomplex numbers. The transform is defined by X m = (cid:88) k cas (cid:32) π kmN (cid:33) x k , (4)where cas( α ) = cos( α ) + sin( α ). The local cosine transform (LCT) [11] reduces and smoothesthe block e ff ects. The local cosine transform is based on the stan-dard block-based DCT. However, the local cosine transform hasbasis functions that overlap adjacent blocks. Prior to the DCT al-gorithm, a preprocessing phase in which the image is multipliedby smooth bell functions that overlap adjacent blocks is applied.This phase is implemented by folding the overlapping parts ofthe bells back into the original blocks. The standard block-basedDCT algorithm then operates on the resulting blocks.The folding operations are defined as f − ( n ) = b ( n ) f ( − n ) − b ( − n ) f ( n ) b ( n ) − b ( − n ) , (5) f + ( n ) = b ( n ) f ( n ) − b ( − n ) f ( − n ) b ( n ) − b ( − n ) , (6)where the f − ( n ) is n th coe ffi cient to the left (top) of the currentblock, the f + ( n ) is n th coe ffi cient to the right (bottom), and b ( n ) = β ((2 n + / N ) is a bell function, where β ( n ) = n < − + sin( π n / otherwise1 n > + x -axis indicates bits per pixel (bpp). The Figure 4: Sample image (on the left) and DST artifacts on block boundaries (onthe right). discrete sine transform performs significantly worse than thereference DCT. This is caused by artifacts on block boundaries,as shown in Figure 4. Also, the discrete Hartley transformperforms worse than the DCT. As we have found, this is causedby blocking artifacts at higher bitrates, where the artifacts are nolonger visible with the DCT. At lowest bitrates, the local cosinetransform has a better image quality than the reference DCT.Unfortunately, at higher bitrates, the image quality is slightlyworse. The results on lower bitrates are caused by reducedblocking artifacts.
The discrete wavelet transform (DWT) became a very pop-ular image processing tool in last decades. For example, theJPEG 2000 standard is based on such decomposition technique.In more detail, the DWT decomposes the image into severalsubbands, while employing simple basis functions, the wavelets[12]. The transform is usually applied on large image tilesinstead of small 8 × / / × ffi cients. This design corresponds to threelevels of a dyadic decomposition [14]. The coordinates of thecoe ffi cients in the 8 × ffi cients more closely correspondto the DCT coe ffi cients. Note that both of the transforms wereimplemented using a lifting scheme [15, 16]. The lifting schemecan decompose the wavelet transforms into a finite sequence ofsimple filtering steps (lifting steps). Usually, the first step inthe pair is referred to as the predict and the second one as theupdate. The red-black wavelet transform (DWT RB) [17] is com-puted using a 2-D lifting scheme on a quincunx lattice [18]. Thewavelets constructed in this way are inherently non-separable.Consequently, the red-black wavelets are less anisotropic thanthe classical tensor product wavelets (the classical DWT). Inother words, the classical DWT will favor horizontal and vertical3 igure 5: Lifting scheme on the quincunx lattice. Filter samples are bordered inblue. The first step (predict) on the left, the second (update) on the right. P S N R [ d B ] BPP DWT 5/3 DWT 9/7 DWT RB 5/3 DWT RB 9/7
Figure 6: Comparison of the separable and non-separable (red-black) wavelettransforms. The separable CDF 9 / features of the image. This is not visible in the red-black wavelettransform.The construction of the red-black wavelets is based on theCDF wavelets above. Therefore, the CDF 5 / / / / /
3. Therefore, the best combination seems to beseparable the CDF 9 / Figure 7: Comparison of the non-separable CDF 9 / / The discrete Chebyshev transform (DChT) [19] is a polynomial-based transform, which employs Chebyshev polynomials of thefirst kind T n ( x ). Since the DCT is closely associated with aChebyshev Polynomial series as cos( n α ) = T n (cos( α )) for some α , the discrete Chebyshev transform can be viewed as a naturalmodification of the DCT. The discrete Chebyshev transform isthen defined using the polynomials t p ( x ) = ( A x + A ) t p − ( x ) + A t p − ( x ), (8)where A , A , and A are constant. The transform is then definedas X m = (cid:88) k t m ( k ) x k . (9) The last of the transforms discussed in this paper is theWalsh–Hadamard transform (WHT). The computation [20] ofthis transform should be very fast since only additions / subtractionsare involved here. The transform is defined as X m = / N (cid:88) k W ( m , k ) x k , (10)where W ( m , k ) = + + + + + + + + + + + + − − − − + + − − + + − − + + − − − − + + + − + − + − + + + − + − − + − − + − − + + − − + + − − + − + + − . (11)Figure 8 shows the performance of the discrete Chebyshevtransform. It is evident that the DChT is everywhere slightlybelow the DCT. Finally, Figure 9 shows an overall comparison,including the Walsh–Hadamard transform. Also, the WHT doesnot overcome DCT in any part of the plot. The only advantage of P S N R [ d B ] BPP DCT DChT
Figure 8: Comparison of the DCT and DChT. The DCT is slightly better. P S N R [ d B ] BPP DCT WHT LDCT DWT 9/7
Figure 9: Overall comparison of selected transforms. the WHT is the computation performance. In summary, only thelocal cosine transform (LCT) overcame the original block-wiseDCT, especially at low bitrates. In addition, it removes blockingartifacts, as documented in Figures 10 and 12. Additionally, theseparable discrete wavelet transform with the CDF 9 / ff ers about the same compression performance as the DCTat high bitrates. To improve the mental image of evaluatedtransforms, bases of selected transforms are visually comparedin Figure 11.
4. Conclusions
The JPEG image compression format uses a lossy form ofcompression based on the discrete cosine transform. This paperdeals with a substitution the discrete cosine transform in theJPEG compression with some other similar transform. Severalpractical transforms were examined, including other trigono-metric transforms, separable and non-separable wavelet trans-forms, a transform employing Chebyshev polynomials, and theWalsh–Hadamard transform. These transforms were evaluatedon several image datasets.The experiments show that only the local cosine transformovercomes the original block-wise DCT at low bitrates. Besides,it removes blocking artifacts. At high bitrates, the CDF 9 / Acknowledgements
This work has been supported by the Ministry of Education,Youth and Sports from the National Programme of Sustainabil-ity (NPU II) project IT4Innovations excellence in science (no.LQ1602), and the Technology Agency of the Czech Republic(TA CR) Competence Centres project V3C – Visual ComputingCompetence Center (no. TE01020415).
ReferencesReferences [1] ITU-T Recommendation T.81. Information technology – Digital com-pression and coding of continuous-tone still images – Requirements andguidelines; 1992.[2] ITU-T Recommendation T.800. Information technology – JPEG 2000image coding system: Core coding system; 2000.[3] ITU-T Recommendation T.832. Information technology – JPEG XR imagecoding system – Image coding specification; 2009.[4] RFC 6386. VP8 Data Format and Decoding Guide; 2011.[5] Mukundan R. Transform coding using discrete Tchebichef polynomials. In:Proceedings of the 6th IASTED International Conference on Visualization,Imaging, and Image Processing, VIIP. ISBN 978-088986598-3; 2006, p.270–5.[6] Malvar HS. Lapped biorthogonal transforms for transform coding withreduced blocking and ringing artifacts. In: IEEE International Conferenceon Acoustics, Speech, and Signal Processing; vol. 3. 1997, p. 2421–4.doi:10.1109 / ICASSP.1997.599545.[7] Olmos A, Kingdom FAA. A biologically inspired algorithm for the recov-ery of shading and reflectance images. Perception 2004;33(12):1463–73.doi:10.1068 / p5321. PMID: 15729913.[8] Franzen R. Kodak lossless true color image suit. released by the EastmanKodak Company. URL http://r0k.us/graphics/kodak/ .[9] Puschel M, Moura JMF. The algebraic approach to the discrete cosine andsine transforms and their fast algorithms. SIAM Journal on Computing2003;32(5):1280–316. doi:10.1137 / S009753970139272X.[10] Bracewell RN. Discrete hartley transform. J Opt Soc Am1983;73(12):1832–5. doi:10.1364 / JOSA.73.001832.[11] Aharoni G, Averbuch A, Coifman R, Israeli M. Local Cosine Transform— A Method for the Reduction of the Blocking E ff ect in JPEG. Boston,MA: Springer US. ISBN 978-1-4615-3260-6; 1993, p. 7–38. doi:10.1007 / CBMS-NSF regionalconference series in applied mathematics . Philadelphia, Pennsylvania:Society for Industrial and Applied Mathematics; 1992. ISBN 0898712742.doi:10.1137 / / cpa.3160450502.[14] Mallat S. A theory for multiresolution signal decomposition: the waveletrepresentation. IEEE Transactions on Pattern Analysis and Machine Intel-ligence 1989;11(7):674–93. doi:10.1109 / / BF02476026.[16] Sweldens W. The lifting scheme: A custom-design construction ofbiorthogonal wavelets. Applied and Computational Harmonic Analysis1996;3(2):186–200. doi:10.1006 / acha.1996.0015.[17] Uytterhoeven G, Bultheel A. The Red-Black wavelet transform. In: SignalProcessing Symposium (IEEE Benelux). IEEE Benelux Signal ProcessingChapter; 1998, p. 191–4.[18] Feilner M, Ville DVD, Unser M. An orthogonal family of quincunxwavelets with continuously adjustable order. IEEE Transactions on ImageProcessing 2005;14(4):499–510. doi:10.1109 / TIP.2005.843754.[19] Corr P, Stewart D, Hanna P, Ming J, Smith FJ. Discrete Chebyshevtransform. a natural modification of the DCT. In: Proceedings 15thInternational Conference on Pattern Recognition. ICPR; vol. 3. 2000, p.1142–5. doi:10.1109 / ICPR.2000.903746.[20] Shanks JL. Computation of the fast Walsh-Fourier transform. IEEETransactions on Computers 1969;C-18(5):457–9. doi:10.1109 / T-C.1969.222685. igure 10: Visual comparison of the original image (left), DCT (middle, PSNR 23.0 dB), and LCT (right, PSNR 23.3 dB).Figure 11: Basis images of selected transforms, from the left: the DCT, DChT, DHT, and WHT. DC coe ffi cient is located in the top left corner.Figure 12: Blocking artifacts, from the left: the original image, DCT, and LCT.cient is located in the top left corner.Figure 12: Blocking artifacts, from the left: the original image, DCT, and LCT.