Tchebichef Transform Domain-based Deep Learning Architecture for Image Super-resolution
11 Tchebichef Transform Domain-based DeepLearning Architecture for Image Super-resolution
Ahlad Kumar and Harsh Vardhan Singh
Abstract —The recent outbreak of COVID-19 has motivatedresearchers to contribute in the area of medical imaging usingartificial intelligence and deep learning. Super-resolution (SR),in the past few years, has produced remarkable results usingdeep learning methods. The ability of deep learning methods tolearn the non-linear mapping from low-resolution (LR) imagesto their corresponding high-resolution (HR) images leads tocompelling results for SR in diverse areas of research. Inthis paper, we propose a deep learning based image super-resolution architecture in Tchebichef transform domain. Thisis achieved by integrating a transform layer into the proposedarchitecture through a customized Tchebichef convolutional layer(
T CL ). The role of
T CL is to convert the LR image fromthe spatial domain to the orthogonal transform domain usingTchebichef basis functions. The inversion of the aforementionedtransformation is achieved using another layer known as theInverse Tchebichef convolutional Layer (
IT CL ), which convertsback the LR images from the transform domain to the spatialdomain. It has been observed that using the Tchebichef transformdomain for the task of SR takes the advantage of high and low-frequency representation of images that makes the task of super-resolution simplified. We, further, introduce transfer learningapproach to enhance the quality of Covid based medical images.It is shown that our architecture enhances the quality of X-ray and CT images of COVID-19, providing a better imagequality that helps in clinical diagnosis. Experimental resultsobtained using the proposed Tchebichef transform domain super-resolution (TTDSR) architecture provides competitive resultswhen compared with most of the deep learning methods employedusing a fewer number of trainable parameters.
Index Terms —Image Super-Resolution (SR), Tchebichef mo-ments, Deep Learning, Convolutional neural network, Covid-19.
I. I
NTRODUCTION T HE coronavirus disease (COVID-19) is a newly emergingviral disease that caused a worldwide pandemic. TheWorld Health Organization has listed it as the sixth interna-tional public health emergency. It has impacted around 170countries and has almost taken the lives of 2 million peopleas of January 2021. COVID-19 diagnosis by X-ray and CTimages is very popular due to the quick results and that withgreat accuracy. This paper proposes a deep learning basedSR architecture to enhance the quality of COVID-19 medicalimages for clinical diagnosis.Image Super-Resolution (SR) is one of the most famousand significant ill-posed problems since there can be multiple
Dr. Ahlad Kumar is an Assistant Professor at Dhirubhai Ambani Insti-tute of Information and Communication Technology (DA-IICT). (e-mail:ahlad [email protected])Harsh Vardhan is currently pursuing his Master’s degree in Information andCommunication Technology from Dhirubhai Ambani Institute of Informationand Communication Technology (DA-IICT), Gandhinagar, Gujarat (e-mail:[email protected] ) possible solutions existing for one single image. It refers tothe process of obtaining high-resolution (HR) images from thecorresponding low-resolution (LR) images. It’s used in widevariety of applications such as in security, mobile cameras,medical imaging, etc. [1] have attracted researchers over thepast few years. SR problems are categorized into Single ImageSuper-resolution (SISR), and Multi-Image Super-resolution(MISR) [2]–[5]. SISR aims to recover the HR image froma single LR image and the SISR based methods have anedge over MISR based methods, as they produce betterperceptual quality of the images. The conventional SR basedmethods uses dictionary-based approaches, that consist of twodictionaries of HR and LR images or patches [6]–[9]. Thesedictionaries are often learned with sparse-coding methodsto get high-quality SR images. Some methods also use thedictionary features that need to be handcrafted [10]. Recentadvancements in deep learning methods have shown the stateof the art results in SR over the different dataset of images[11]. The earliest deep learning method was SRCNN [12]and later its improved version was reported in [13]. Recursivenetworks [14] and residual learning [15] is also incorporatedin the deep learning architectures to boost the training process.We propose a new orthogonal domain based deep learningarchitecture for image SR. It uses Tchebichef moment totransform spatial domain to orthogonal domain and then findsthe difference between the Tchebichef coefficients of HR andLR image pairs. It has been observed that HR-LR imagepairs have huge differences in the coefficient values at higherfrequencies and small to negligible differences in the lowerfrequencies. This key observation plays a central role indeveloping the proposed architecture. The key aspects of thispaper are summarized as follows • A deep neural network to solve the SR problem in anorthogonal transform domain is introduced. The archi-tecture includes both the forward and inverse mapping,hence provides a complete pipeline for image SR. • The proposed architecture exploits the Tchebichef kernelsto generate the representation of images in the transformdomain. Two custom convolutional layers are designed;one for converting the image to transform domain (
T CL )and the other, for doing inverse transformation (
IT CL ). T CL layer is kept fixed and non-trainable, while the
IT CL as trainable to get the optimized reconstructionkernels, which are used in
IT CL layer for convertingfrom transform domain to spatial domain. • The proposed architecture consist of high and low-frequency paths. The high frequency path employs a r X i v : . [ ee ss . I V ] F e b Inception-Resnet based structure with local residual con-nection to boost the training process. The low-frequencypath employs a simple convolutional neural network(CNN) architecture. • To handle the artifacts that occur during the reconstruc-tion phase after ITCL, additional convolutional layersare employed to process the spatial domain images thatresults in the enhanced SR images.II. R
ELATED W ORK
A. Single Image Super Resolution (SISR)
The SISR poses a highly challenging problem that is notso easy to solve due to the ill-posed nature, and hence,SISR involves multiple solutions for one LR image or patch.SISR can mainly be divided into three categories: learning,interpolation and reconstruction-based methods.Learning-based SISR methods [7], [16] are computationallyfast and have good performance, specifically, sparse-codinghas shown compelling results. The method involves two dic-tionaries of images, separately for LR and HR patches. LRimage or patch are represented in terms of their correspondingsparse code from the LR dictionary. Similarly, an HR imageor patch is generated by the same sparse code, but it isapplied to the HR dictionary. Interpolation SISR methodsinvolve techniques, like bicubic interpolation [17] and Lanczosresampling [18], although they are speedy and straightforward,but lack satisfactory accuracy. Reconstruction based methods[19]–[22] are time-consuming, SISR relies on advantages fromprior knowledge but these methods highly degrade the SRimage quality when the scaling factor is increased.
B. Deep Learning Advancements in Image Super Resolution
Earlier due to lack of computational capabilities and notso advanced architectures, the conventional SR methods haveproduced average results. Recent advancements both in com-putational architectures and effective research using deepneural networks, have produced state-of-the-art results in theSR domain. Dong et al. [12] introduced a CNN network forthe SR problem, known as SRCNN. It involved three layersof architecture that outperformed the previous methods basedon sparse-coding. SRCNN introduced the non-linear mappingfunction between the LR and HR image. Image patchesare fed to the CNN to generate the feature representationfollowed by more convolutional layers to produce the higherrepresentation of images. Although SRCNN produced greatresults, but continuous advancements in the SR domain led tonewer methods and architectures.Since then fully connected neural networks have also shownconsiderable improvements in SR by combining several ideasfrom different architectures; such as sparse-coding, residuallearning, etc. Prior information based deep neural networkshas shown promising results. FSRNet [23] used to generatethe human face SR images. Wang et al. [24] focused on usingthe structural feature priors for effective recovery of detailedtexture features in an image.A huge improvement in SR image quality was observedusing Generative Adversarial Networks (GANs) [25]. Johnson et al. [26] focused on creating the photo-realistic image withhigh perceptual quality and focused less on maintaining thepixel-wise difference between the images. GANs generatevisually better images but training a GAN is a complexand time-consuming process making it unsuitable for manypractical applications.Li et al. [27] introduced the combined architecture usingimage transform domain as well as CNN, and converted theinput image/patch into the Fourier transform domain. Thediscrete Fourier transform (DFT) coefficients, thus obtainedare passed to a CNN architecture. As the convolution of thekernels and image in the spatial domain is equivalent to themultiplication of kernels and images in Fourier domain, theclaim is that the CNN architecture performs the element-wisemultiplication to speed up the training process. Experimen-tally, it was observed that the performance was similar toprevious works but not at par with state-of-the-art methods.Similar work is reported in [28] which uses a combination oftransform domain and CNN, where the images are transformedusing discrete cosine transform (DCT) and incorporated thepre and post-processing of DCT and inverse discrete cosinetransform (IDCT) coefficients within the CNN, using customconvolutional layer.This paper exploits the image transform domain usingTchebichef basis functions that are integrated as kernels withthe CNN network. A custom convolutional layers (
T CL ) fortransforming the image into orthogonal domain and the reverseusing inverse transform (
IT CL ) is proposed. The transformdomain representations of an image is utilized to learn thehigh-frequency details that are lost during the degradationusing bicubic interpolation. The inverse transformation layer(
IT CL ) is kept trainable and optimizable, hence the kernelsused in
IT CL optimize with training process. The optimizedkernels obtained after training process provides the best pos-sible reconstruction of the images from its correspondingtransform domain. Moreover, the reconstruction artifacts likesoftness and haziness are removed by the help of additionallayers, incorporated in the proposed architecture. The mainobjective of this paper is to performs SR effectively bypreserving the visual attributes of an image.III. T
CHEBICHEF M OMENTS
A. Computation of Tchebichef Moments
In this section, a brief review about the definition ofTchebichef moment [29] is discussed. They have been usedrecently in many pattern recognition [30]–[32] and denoisingapplications [33], [34]. Its robust feature representation capa-bilities allows to reconstruct images with promising results.The Tchebichef moment of order ( m + n ) for an image withintensity function g ( x, y ) is given as [29] T n,m ( u ) = N − (cid:88) x =0 N − (cid:88) y =0 (cid:101) t m ( x ; N ) (cid:101) t n ( y ; N ) g ( x, y ) (1)with n, m = 0 , , , . . . , N − . The image g ( x, y ) , beingof size N × N , (cid:101) t n ( x ; N ) and (cid:101) t m ( y ; N ) are the normalized Tchebichef polynomials, given by (cid:101) t m ( x ; N ) = t m ( x ; N ) ρ ( m, N ) (2)and (cid:101) t n ( y ; N ) = t n ( y ; N ) ρ ( n, N ) (3)with ρ ( m, N ) =(2 m )! (cid:0) N + m m +1 (cid:1) , ρ ( n, N ) =(2 n )! (cid:0) N + n n +1 (cid:1) and t n ( x ; N ) is the N -th order N -point Tchebichef polynomialdefined as t n ( x ; N ) = n ! n (cid:88) k =0 ( − n − k (cid:18) N − − kn − k (cid:19)(cid:18) n + kn (cid:19)(cid:18) xk (cid:19) To simplify the notation, t n ( x ) is used to represent t n ( x ; N ) .Here, t n ( x ) is the ortho-normal version of Tchebichef poly-nomials and it can be calculated using recurrence relation as[29] ˜ t n ( x ) = α (2 x + 1 − N )˜ t n − ( x ) + α ˜ t n − ( x ) (4)where α = 1 n (cid:114) n − N − n α = 1 − nn (cid:114) n + 12 n − (cid:114) N − ( n − N − n (5)The initial conditions for the above recurrence relationshipis given as ˜ t ( x ) = 1 / √ N ˜ t ( x ) = (2 x + 1 − N ) (cid:115) N ( N − (6)Image can be reconstructed back from Tchebichef momentsusing the inverse Tchebichef transformation as given by g ( x, y ) = N − (cid:88) n =0 N − (cid:88) m =0 (cid:101) t m ( x ; N ) (cid:101) t n ( y ; N ) T n,m ( u ) (7) B. Matrix Form
Tchebichef moment in (1) can also be implemented inmatrix format. The set of Tchebichef moments upto order ( m + n ) in matrix form is given as T = PGQ T (8)where G is a square image matrix. P and Q are Tchebichefpolynomials in matrix form upto orders of p and q , respectivelygiven as P = ˜ t (0) . . . ˜ t ( N − ... . . . ... ˜ t p (0) . . . ˜ t p ( N − (9) Q = ˜ t (0) . . . ˜ t ( N − ... . . . ... ˜ t q (0) . . . ˜ t q ( N − (10)Similarly, the inverse transformation given in (7) can berepresented in matrix form as G = P T TQ (11) C. Basis functions of Tchebchief Moment
Tchebichef moments of an image can be interpreted as theprojection of the image on the basis (kernel) functions, w pq which is given as w pq = (cid:2) ˜ t p (cid:3) T ˜ t q (12)where ˜ t p = (cid:2) ˜ t p (0) ˜ t p (1) . . . ˜ t p ( N − (cid:3) ˜ t q = (cid:2) ˜ t q (0) ˜ t q (1) . . . ˜ t q ( N − (cid:3) (13)The complete set of w pq basis functions is shown in Fig.1. Tchebichef moments can also be viewed as correlationbetween the basis functions and image G . A high value isrecorded, if there is a strong similarity between the content ofthe image and basis function, and vice versa. Fig. 1: Basis functions of Tchebichef moment
D. Basis Ordering and its Significance
In the proposed architecture, Tchebichef basis functions areused as filters and are re-arranged in a zig-zag order as shownin Fig. 2 This zig zag reordering is inspired from the JPEGcompression procedure [35].
Fig. 2: Zig-zag re-ordering of Tchebichef basis functions
Zig-Zag ordering of the basis functions allows to exploitthe transform domain efficiently. We represent the 64 zig-zag reordered basis functions as w i where i = 0 to 63. Itis observed that this particular re-ordering of basis functionsresults in an increase frequency pattern (complexity) in thebasis functions, i.e, as the index i increases, the frequencycontent increase from low to high. i ) 0510152025 C o e ff i c i e n t s V a l u e LR Image (a) i ) 0510152025 C o e ff i c i e n t s V a l u e HR Image (b) i ) −150−100−50050100150 H R - L R c o e ff i c i e n t d i ff e r e n c e (c) i ) 0510152025 C o e ff i c i e n t s V a l u e LR Image (d) i ) 0510152025 C o e ff i c i e n t s V a l u e HR Image (e) i ) −150−100−50050100150 H R - L R c o e ff i c i e n t d i ff e r e n c e (f)Fig. 3: Top row: Tchebichef coefficients for (a) LR (b) HR (c) Difference in the coefficient of HR and LR for Medical image. Bottom row: Tchebichefcoefficients for (d) LR (e) HR (f) Difference in the coefficient of HR and LR for natural image The average value of the coefficients generated by theconvolution of the Tchebichef kernels with HR and LR imagesrespectively, is shown in Fig. 3. Here, Figs. 3(a)-(b) show thecoefficients of a LR and HR version of the medical image,while Fig. 3(c) shows the differences between the HR and LRimage coefficients. Please note that the values obtained in Fig.3(c) is scaled for proper visualization. It can be observed fromthis figure that with the increase in kernels complexity, thereis a substantial loss of the coefficients in the high frequencywhen compared to lower frequency region. In the Tchebichefdomain, the problem of SR becomes recovering the highfrequency Tchebichef coefficients of the HR image from theircorresponding LR images. This observation is incorporated inthe proposed architecture discussed next. Similar analysis iscarried out on natural image and the results are shown in Figs.3(d)-(f).IV. P
ROPOSED T CHEBICHEF T RANSFORM D OMAIN S UPER R ESOLUTION (TTDSR)In this section, a detail description of the proposed archi-tecture, shown in Fig. 4 for SR is discussed. The architectureconsist of the following blocks: (1) Tchebichef convolutionallayer (
T CL ) (2) Frequency cube (3) Non-linear mapping forlow frequency. (4) Inception-residual connection for high fre-quency (5) Inverse Tchbeichef transformation layer (
IT CL ). A. Network Structure1) Tchebichef convolutional layer (
T CL ): This blocktransforms images from spatial domain to Tchebichef momentdomain, and has basis function w i as the kernels. There are64 such kernels of size 8 × i of thekernels. The detail about this is discussed later in Sec. III-D.The transformation from spatial to Tchebichef momentdomain works as follows: T CL layer creates a 64 featuremaps f i for the entire image by performing convolution using w i with image G as given in (14). Here, (cid:126) represent theconvolution operation and is performed using the stride S = 1 and same padding in order to preserve the dimension of theimage. f i = w i (cid:126) G , ∀ i ∈ { , . . . , } (14)Kernels of the T CL layer are kept fixed and non-trainableduring the training phase as the primary role of this layer isto convert images into the transform domain.
2) Frequency Cube:
Frequency domain feature maps f i =0 ,..., , obtained from (14), are used to form a cube (seelabel 2 marked in Fig. 4). This cube is re-organized version ofTchebichef coefficients, calculated for the whole image and isordered in increasing frequency content (complexity). Basedon the detail discussion carried out for Fig. (3) in Sect. III-D,it has been observed that there is a substantial loss of thecoefficients in the high frequency region compared to lowerfrequency region. Due to this reason, partition of the frequencycube is done into two parts with a particular split point T . Thelow and high frequency maps are defined as f low = f i =1 ,...,T and f high = f i = T +1 ,..., , respectively. Fig. 5 shows the detailsabout this partitioning process. The calculation of the splitpoint T is experimental and its optimal value is obtained as5. The discussion about its optimal value is carried out inexperimental section. Fig. 4: The proposed TTDSR network architecture. Please refer to the number markings used in explanation of architecture. Blue and green color are usedfor the high and low frequency details respectively. Yellow color background shows entire processing in the transform domain and light blue is used for thespatial domain.Fig. 5: The cube is partitioned at a split point T to generate the high frequencyand low frequency cube.
The proposed architecture process the partitioned cubes f low and f high separately. It can be observed from Figs. 3(c) and(f), that there is a higher amount of coefficient loss in high-frequency region and thereby the high frequency block f high requires more robust and complex mapping to recover the HRimage from LR image. On the other hand, the coefficient lossin low-frequency region is not so significant, but does playan important role in image quality. Hence, processing thelow frequency block f low is done, using simpler non-linearconvolutional mapping to recover the image details. Next, wediscuss the simple and complex deep learning architectures for f low an f high .
3) Architecture for f low : The mapping of low-frequencycoefficients of LR image to corresponding low frequency coefficients of HR image is accomplished via CNN networkconsisting of two convolutional layers (see green arrow in Fig.4). The first layer is a 5 × × z [0] low = f low z [ k ] low = max (cid:16) z [ k − low (cid:126) W [ k ]1 + B [ k ]1 , α f low (cid:17) k ∈ { , } (15)where k represents the index to the two convolutional layers, z [ k ] low is the output of k th layer, W [ k ]1 and B [ k ]1 are the weightsand biases of the k th layer, α is the leaky ReLU parameterhaving value of 0.1. The non-linear mapping of (15) recoversthe information loss in the lower frequency spectra of theimage
4) Architecture for f high : To recover the information lossin high frequency spectra of the image, a non-linear mappingis implemented using the deep learning architecture inspiredfrom the inception network [36] (see green arrow in Fig.4). The high frequency feature maps f high of LR image issplit into three convolutional paths; each of which are usingdifferent kernel sizes, namely, 3 ×
3, 5 ×
5, and 7 ×
7. Largerkernel sizes are used in gathering the global information, whilethe smaller kernel size gather information that is distributedmore locally in the feature map. This allows the model totake advantage of the multi-level feature extraction. Finally,concatenating the features obtained from all the levels is donefollowed by a 1 × and secondly, it reduces the depth of the network. The non-linear mapping for the process described above is given as z [ k ] high = max (cid:16) f high (cid:126) W [ k ]2 + B [ k ]2 , α f high (cid:17) , k ∈ { , , } (16) z Thigh = (cid:88) k =1 z [ k ] high (17)Here, z Thigh is the combination of all the feature maps obtainedvia three parallel paths denoted by k .
5) Inverse Tchebichef Transformation Layer (ITCL):
Thislayer is required to transform image from Tchebichef momentdomain to spatial domain. It takes input which is obtained bycombining both, the low and high frequency cubes, z [2] low and z Thigh , respectively. The output of this layer reconstruct theimage (cid:98) y in spatial domain and is given as ˆ y = (cid:88) i =1 w i (cid:126) (cid:16) z [2] low + z Thigh (cid:17) i (18)Here, the weights of the Tchebichef kernel w i are trainable, sothat during the training process the kernels adapt to the dataand provides efficient inverse transformation.
6) Fine-Tuning Network:
The reconstructed image (cid:98) y ob-tained using (18) is further processed through a small finetuning network shown in Fig. 4, which consist of threeconvolutional layers. The main purpose of introducing thisadditional network is to get rid of minor artifacts from theimage (cid:98) y . V. E XPERIMENTAL W ORK
A. Training Details
This section discuss the training details of the proposedTTDSR architecture. In order to learn the end to end mappingfunction F for SR task, optimized values of the networkparameters θ ∈ ( W [ k ]1 , B [ k ]1 , W [ k ]2 , B [ k ]2 ) are required. Theseparameters can be obtained by minimizing the loss between thenetwork generated reconstructed SR image F ( Y i , θ ) , and thehigh-resolution ground truth image X . Given the batch of high-resolution images X i and the corresponding low-resolutionimages Y i , the loss function is given as L ( θ ) = 1 M M (cid:88) i =1 (cid:107) F ( Y i ; θ ) − X i (cid:107) + λ l (cid:88) j =1 W j (19)where M is the total number of training images, λ is the reg-ularization parameter and l is the total number of kernels usedin the architectures discussed in Sections IV-A3 and IV-A4.The loss is minimized using the Adaptive moment estimationi.e Adam [37] optimizer with standard back-propagation [38],that computes adaptive learning rates of each parameter. Thetraining phase of the network involves Adam as optimizer withdefault parameters as: β = 0 . (the first-order moments), β = 0 . (the second-order moments), (cid:15) = 1e − (a smallconstant for numerical stability). Learning rate η is initialisedas − . The filter weights for each layer in the network are initialized with Glorot-uniform that draws the samples fromthe uniform distribution. It is observed that without usingregularization, the network becomes highly un-stable, hence,L2 regularization of λ = 0 . is applied on the networkweights to penalize the weights. The T CL is kept as non-trainable, while the
IT CL is kept as trainable to get thebest optimized Tchebichef kernels. There are 14 convolutionallayers in the TTDSR architecture leading to total numberof parameters as 94k, out of which 90k are the trainableparameters and the remaining are fixed parameters which areused in the
T CL layer. The network is trained for 100 epochswith batch size of 64. Training and testing phase are conductedon NVIDIA GeForce RTX 2080 Ti GPU, with the TensorFlowas support package.
B. Datasets
There are several widely used dataset for image SR. Thecombination of images from T91 [39] and DIV2K [11] datasethave been used to create a combined training dataset. From arobust model, data augmentation technique is used, where theimages are augmented using three methods, i.e,a. The images are rotated using 45°, 90°, 135°, 180°, 225°.b. Horizontal and vertical flips of the images are done.c. Scaling of images is performed by factor of 0.6, 0.7, 0.8and 0.9 respectively.These augmented images are the variations in the HR image,which are then down-sampled by a factor of η . The down-sampled images are scaled up using the bi-cubic interpolationof same factor η to form the degraded LR images for training.Training images are first converted from RGB to Y CbCr format. Inspired from [9], [15], [40], luminance ( Y ) channel isused as the input to the architecture while Cb and Cr channelsare directly up-scaled, using the bi-cubic interpolation ofLR images. Finally, combining the up-scaled Cb and Cr channels, with the predicted luminance ( Y ) , channel from thearchitecture is used to generate the SR image which is thenconverted back into RGB format. As the proposed architectureis trained on single channel, i.e., Y in Y CbCr domain, thisallow for the flexibly of performing transfer learning on Covid-19 medical images, which are gray-scale images. SR resultsfor the same can be analyzed in the experimental work.During the testing phase, several standard datasets like Set5[41], Set14 [42], BSDS100 [43] and Urban100 [44] havebeen utilized to evaluate the performance of the proposedarchitecture. The metrics used for image quality assessmentare PSNR and SSIM [45]. Few published methods work withlarger datasets like DIV2K [11], ImageNet [46] and MS-COCO [47]. However, our choice of datasets for comparisonhas been used to keep it consistent with the majority of themethods that make use of it. The above datasets used fortesting mainly consist of natural images. As pointed out earlier,transfer learning is carried out on the proposed architecture,which is now tested on medical images. For this, COVID-19image database which contains a set of images collected byCohen et al. [48] are used. The dataset contains of chest X-ray and computed tomography (CT) images. The images are mainly in gray-scale format and is a collection of anterior-posterior view of chest X-rays. The dataset is continuouslyupdated and it is worth mentioning that the resolution ofimages varies from image to image. A sample of these imagescan be found in Fig. 6.
Fig. 6: Sample images from COVID-19 dataset which contains both X-rayand CT images.
C. Comparative Analysis
In this section, the performance of the proposed architecturewith the other existing methods is discussed. For this, severalstandard datasets (Sect. V-B) are being used for its evaluation.Here, following methods are used for comparison with ourarchitecturea. ScSR [9] : Sparse coding based SR method, constructsLR-HR image patch dictionary.b. A+ [49] : Adjusted anchored neighbourhood regressionfor fast super resolution is the updated and modifiedversion of [50].c. SelfEx [44] : Self similarity based method that measuresthe similarity within the images.d. SCN [51] : Sparse prior method implemented with thehelp of CNN.e. SRCNN [12] : Earliest deep learning method for imageSR based on CNN architecture.f. FSRCNN [13] : An advanced and modified version ofSRCNN with deeper architecture and transpose convolu-tion approach.g. VBPS [52] : Recent method for image SR that exploitsthe inherent self-similarities found in images.Tables I and II report the PSNR and SSIM results of TTDSRand other methods, respectively. Out of all the methods,FSRCNN and VBPS gives competitive score when comparedwith TTDSR. However, TTDSR on an average performs wellon all kinds of datasets. We now make a subjective comparisonof SR results using various methods. Figs. 8 and 7 show theSR results for the TTDSR and other methods in enhancing thequality of the image degraded due to bi-cubic interpolation.Fig. 7 shows the SR results on monarch image. The enlargedversion shows the thin black edge at head of monarch image.It can be observed that bi-cubic interpolated image shows heavy loss of thin edge details with discontinuity. Also, othermethods fails to generate the edges gracefully. The proposedTTDSR architecture generates clear edge and overcomes thediscontinuity artifact observed in other methods and givesbetter results in terms of PSNR and SSIM. Fig. 8 shows theSR results on bi-cubic interpolated zebra image. It can beobserved that the black and white strips present on zebra lacksdetails and fails to capture the orientation of the edges. Further,the FSRCNN method shows slightly better results comparedto our method in terms of PSNR and SSIM, but the diagonaledges are overlapping leading to poor image visualization.On the other hand, though TTDSR gives second best resultsbut it exploits the frequency domain details to overcome thisdegradation and thereby, generate correctly oriented black andwhite strips.Next, SR results on COVID-19 medical images is carriedout. Here, the aim is not to detect the infection of COVID-19 through images, but to enhance the quality of these imageto provide better diagnosis. The dataset used for this purposeis presented in Cohen et al. [48]. As our model is trainedon single channel, i.e. Y(luminance) and medical images aregray-scale image that contains only luminance information ofthe pixel and no color information. This provides a flexiblyto use TTDSR architecture on medical images using transferlearning. The SR results on the image can be seen in Fig.9. Theaverage PSNR and SSIM comparison on COVID-19 datasetcan be seen in Table III. It can be observed that the proposedmethod gives better results compared to other methods.
D. Network Parameters and its impact1) Split Point for Tchebichef Frequency Cube:
Tchebichefpolynomials are treated as filters and create a frequency cubein the image transform domain as shown in Fig. 5. In thearchitecture (Fig. 4), it can be observed that there are two sub-networks, one for recovering the loss in high-frequency contentand the other works for recovering loss in low-frequencycontent. The frequency cube is split into two halves at a splitpoint T. The splitting of the frequency cube is experimentaland the performance of the network varies with the differentvalues of T. In this paper, the value of T is taken as 5 basedon the experiment conducted in Fig. 10, where for differentdatasets, the average PSNR reported by the network is highestwhen T is taken as 5.
2) Impact of Residual Connection:
Initially, the networkwas structured without the residual connection and only reliedon the inception based module. In this case the networkperformed with limited capability due to vanishing gradientproblem. To overcome this problem, two major residual con-nections are added in the network, local residual connectionfor high frequency components ( f high ) and the overall globalresidual connection for both high and low frequency com-ponent. The local residual connection [53] in the inceptionmodule was introduced to boost the gradients in the trainingphase for the recovery of high-frequency components of theimage. The use of local residual connectivity overcomes thevanishing gradient problem and helps the optimizer to reachthe minima faster. Experimental analysis for same is shown TABLE I
PSNR COMPARISON FOR VARIOUS SR METHODS. BLUE REPRESENTS FIRST AND RED REPRESENTS SECOND BEST.Dataset Scale Bicubic ScSR A+ SelfEx SCN SRCNN FSRCNN VBPS TTDSR[9] [49] [44] [51] [12] [13] [52] (Proposed)Set5 2x 33.64 35.78 36.55 36.50 36.58 36.66 36.94 36.97 35.353x 30.39 31.34 32.58 32.62 32.61 32.75 33.06 33.23
4x 28.42 29.07 30.27 30.32 30.41 30.48 30.55 31.19
Set14 2x 30.22 31.64 32.29 32.24 32.35 32.42 32.54 32.83
3x 27.53 28.19 29.13 29.16 29.16 29.28 29.37 29.61
4x 25.99 26.40 27.33 27.40 27.39 27.40 27.50 27.86
BSDS100 2x 29.55 30.77 31.21 31.18 31.26 31.36 31.66 -
3x 27.21 27.72 28.18 28.30 28.58 28.20 28.52 -
4x 25.96 26.61 26.82 26.84 26.88 26.84 26.92 -
Urban100 2x 26.66 28.26 29.20 29.54 29.52 29.50 29.87 30.36
3x 24.46 25.34 26.03 25.69 25.56 26.24 26.35 26.94
4x 23.14 24.02 24.32 24.78 25.13 24.52 24.61 25.12
TABLE II
SSIM COMPARISON FOR VARIOUS SR METHODS. BLUE REPRESENTS FIRST AND RED REPRESENTS SECOND BEST.Dataset Scale Bicubic ScSR A+ SelfEx SCN SRCNN FSRCNN TTDSR[9] [49] [44] [51] [12] [13] (Proposed)Set5 2x 0.9292 0.9485 0.9544 0.9538 0.9540 0.9542 0.9558
3x 0.8678 0.8869 0.9088 0.9092 0.9080 0.9090 0.9140
4x 0.8101 0.8263 0.8605 0.8640 0.8630 0.8628 0.8657
Set14 2x 0.8683 0.8940 0.9055 0.9032 0.9050 0.9063 0.9088 0.89923x 0.7737 0.7977 0.8188 0.8196 0.8180 0.8209 0.8242
4x 0.7023 0.7218 0.7489 0.7518 0.7510 0.7503 0.7535
BSDS100 2x 0.8425 0.8744 0.8864 0.8855 0.8850 0.8879 0.8920 0.87473x 0.7382 0.7647 0.7836 0.7778 0.7910 0.7863 0.7897
4x 0.6672 0.6983 0.7087 0.7106 0.7110 0.7101 0.7201
Urban100 2x 0.8408 0.8828 0.8938 0.8967 0.8970 0.8946 0.9010 0.87293x 0.7349 0.7827 0.7973 0.7864 0.8016 0.7989 0.7512 0.78484x 0.6573 0.7024 0.7186 0.7374 0.7260 0.7221 0.7270 0.7324
Original Image Original Bicubic(29.99,0.523) ScSR(29.44,0.593)A+(29.70,0.639) SCN(27.72 0.681) SelfEx(30.14,0.705)SRCNN(29.74,0.631) FSRCNN(29.64 0.669) TTDSR(30.11,0.772)
Fig. 7: The SR results on image monarch.bmp from Set14 for scale factor of 3 using different methods. Results are represented in the form of (PSNR,SSIM).
Original Image Original Bicubic(26.55,0.795) ScSR(27.79,0.831)A+(27.93,0.850) SCN(22.84,0.809) SelfEx(27.62,0.821)SRCNN(27.93,0.847) FSRCNN(28.137,0.8521) TTDSR(27.99,0.849)
Fig. 8: The SR results on image zebra.bmp from Set14 for scale factor of 3 using different methods. Results are represented in the form of (PSNR,SSIM).Fig. 9: Comparison of SR results on COVID-19 image dataset using different methods. Results are evaluated on scale factor of 2.
TABLE III
PSNR/SSIM COMPARISON ON COVID-19 DATASETDataset Scale Bicubic SRCNN TTDSR(Proposed)COVID-19 2x 41.32/0.9419 41.76/0.9493
3x 40.19/0.9248 40.25/0.9281
4x 39.09/0.9059 38.96/0.9077 in Fig. 11, where it can be observed that the architecture withresidual connections converges to smaller loss L , when usedwithout residual connections.
3) Optimized Learned Tchebichef filters:
As discussed, thenetwork architecture consists of two custom layers namely
T CL and
IT CL . The kernel functions used in
T CL are fixed;whereas in that of
IT CL are kept trainable to adapt to thetraining phase of the network. Fig. 12 shows the optimized ker-nels obtained after training process. These optimized kernelsare used to reconstruct the image from Tchebichef momentdomain and hence gives part of its contribution in providing P S N R V a l u e Set5Set14BSDS100Urban100
Fig. 10: Average PSNR of Set5, Set14, BSDS100 and Urban100 datasets usingdifferent values of T. L o ss Training Without Residual0 10 20 30 40 50 60 70 80Number of Epochs1e-032e-032e-033e-03 L o ss Training With Residual
Fig. 11: Training loss with and without residual architecture. better image quality when compared to other methods.
Fig. 12: Optimized Tchebichef kernels obtained after training process
VI. C
ONCLUSION
A deep learning architecture for super resolution of naturaland COVID-19 medical images is presented. It makes use ofTchebichef transform domain that helps in exploiting the lowand high frequency details present in the images to enhanceits quality. The detail analysis of various parameters thataffects the performance of the architecture is discussed. Theobjective and visual comparison of the SR results with theexisting methods shows that the proposed architecture providessuperior results when evaluated in terms of average PSNR andSSIM metrics. The visual comparison of the result shows that our work restore the details present in an image effectively.This work opens up the possibility to explore SR architecturesusing different transform domains.R
EFERENCES[1] Sung Cheol Park, Min Kyu Park, and Moon Gi Kang. Super-resolutionimage reconstruction: A technical overview.
IEEE Signal ProcessingMagazine , 20(3):21–36, May 2003.[2] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar. Fast and robustmultiframe super resolution.
IEEE Transactions on Image Processing ,13(10):1327–1344, 2004.[3] Sina Farsiu, Dirk Robinson, Michael Elad, and Peyman Milanfar.Advances and challenges in super-resolution.
International Journal ofImaging Systems and Technology , 14, 01 2004.[4] Q. Yuan, L. Zhang, and H. Shen. Multiframe super-resolution employinga spatially weighted total variation model.
IEEE Transactions on Circuitsand Systems for Video Technology , 22(3):379–392, 2012.[5] Xuelong Li, Yanting Hu, Xinbo Gao, Dacheng Tao, and Beijia Ning. Amulti-frame image super-resolution method.
Signal Processing , 90:405–414, 02 2010.[6] S. Mallat and G. Yu. Super-resolution with sparse mixing estimators.
IEEE Transactions on Image Processing , 19(11):2889–2900, 2010.[7] Hong Chang, Dit-Yan Yeung, and Yimin Xiong. Super-resolutionthrough neighbor embedding. In
Proceedings of the 2004 IEEE Com-puter Society Conference on Computer Vision and Pattern Recognition,2004. CVPR 2004. , volume 1, pages I–I, 2004.[8] D. Glasner, S. Bagon, and M. Irani. Super-resolution from a singleimage. In ,pages 349–356, 2009.[9] J. Yang, J. Wright, T. S. Huang, and Y. Ma. Image super-resolutionvia sparse representation.
IEEE Transactions on Image Processing ,19(11):2861–2873, 2010.[10] L. Zhang and W. Zuo. Image restoration: From sparse and low-rankpriors to deep priors [lecture notes].
IEEE Signal Processing Magazine ,34(5):172–179, 2017.[11] R. Timofte et al. Ntire 2017 challenge on single image super-resolution:Methods and results. In , pages 1110–1121, 2017.[12] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Imagesuper-resolution using deep convolutional networks.
IEEE transactionson pattern analysis and machine intelligence , 38(2):295–307, 2015.[13] Chao Dong, Chen Change Loy, and Xiaoou Tang. Accelerating thesuper-resolution convolutional neural network. In
European conferenceon computer vision , pages 391–407. Springer, 2016.[14] J. Kim, J. K. Lee, and K. M. Lee. Deeply-recursive convolutionalnetwork for image super-resolution. In , pages 1637–1645,2016.[15] J. Kim, J. K. Lee, and K. M. Lee. Accurate image super-resolutionusing very deep convolutional networks. In , pages 1646–1654,2016.[16] W. T. Freeman, T. R. Jones, and E. C. Pasztor. Example-based super-resolution.
IEEE Computer Graphics and Applications , 22(2):56–65,2002.[17] R. Keys. Cubic convolution interpolation for digital image process-ing.
IEEE Transactions on Acoustics, Speech, and Signal Processing ,29(6):1153–1160, 1981.[18] Claude Duchon. Lanczos filtering in one and two dimensions.
Journalof Applied Meteorology - J APPL METEOROL , 18:1016–1022, 08 1979.[19] Shengyang Dai, Mei Han, Wei Xu, Ying Wu, Yihong Gong, and AggelosKatsaggelos. Softcuts: A soft edge smoothness prior for color imagesuper-resolution.
IEEE transactions on image processing : a publicationof the IEEE Signal Processing Society , 18:969–81, 06 2009.[20] Jian Sun, Zongben Xu, and Heung-Yeung Shum. Image super-resolutionusing gradient profile prior. In , pages 1–8, 2008.[21] Q. Yan, Y. Xu, X. Yang, and T. Q. Nguyen. Single image superresolutionbased on gradient profile sharpness.
IEEE Transactions on ImageProcessing , 24(10):3187–3202, 2015.[22] Antonio Marquina and Stanley Osher. Image super-resolution by tv-regularization and bregman iteration.
Journal of Scientific Computing ,37:367–382, 01 2008. [23] Yu Chen, Ying Tai, Xiaoming Liu, Chunhua Shen, and Jian Yang.Fsrnet: End-to-end learning face super-resolution with facial priors. In Proceedings of the IEEE Conference on Computer Vision and PatternRecognition (CVPR) , June 2018.[24] Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. Recover-ing realistic texture in image super-resolution by deep spatial featuretransform.
CoRR , abs/1804.02815, 2018.[25] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, DavidWarde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio.Generative adversarial nets. In
Proceedings of the 27th InternationalConference on Neural Information Processing Systems - Volume 2 ,NIPS’14, page 2672–2680, 2014.[26] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses forreal-time style transfer and super-resolution. In Bastian Leibe, Jiri Matas,Nicu Sebe, and Max Welling, editors,
Computer Vision – ECCV 2016 ,pages 694–711, 2016.[27] J. Li, S. You, and A. Robles-Kelly. A frequency domain neural networkfor fast image super-resolution. In , pages 1–8, 2018.[28] T. Guo, H. Seyed Mousavi, and V. Monga. Adaptive transform domainimage super-resolution via orthogonally regularized deep networks.
IEEE Transactions on Image Processing , 28(9):4685–4700, 2019.[29] R. Mukundan, S. H. Ong, and P. A. Lee. Image analysis by tchebichefmoments.
IEEE Transactions on Image Processing , 10(9):1357–1364,2001.[30] Hongqing Zhu, Huazhong Shu, Ting Xia, Limin Luo, and Jean LouisCoatrieux. Translation and scale invariants of tchebichef moments.
Pattern recognition , 40(9):2530–2542, 2007.[31] Bin Xiao, Jian-Feng Ma, and Jiang-Tao Cui. Radial tchebichef momentinvariants for image recognition.
Journal of Visual Communication andImage Representation , 23(2):381–386, 2012.[32] Haiyong Wu and Senlin Yan. Computing invariants of tchebichefmoments for shape based image retrieval.
Neurocomputing , 215:110–117, 2016.[33] Ahlad Kumar, M Omair Ahmad, and MNS Swamy. Tchebichef andadaptive steerable-based total variation model for image denoising.
IEEETransactions on image processing , 28(6):2921–2935, 2019.[34] Ahlad Kumar, M Omair Ahmad, and MNS Swamy. Image denoisingvia overlapping group sparsity using orthogonal moments as similaritymeasure.
ISA transactions , 85:293–304, 2019.[35] G. K. Wallace. The jpeg still picture compression standard.
IEEETransactions on Consumer Electronics , 38(1):xviii–xxxiv, 1992.[36] C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov,D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper withconvolutions. In , pages 1–9, 2015.[37] Diederik Kingma and Jimmy Ba. Adam: A method for stochasticoptimization.
International Conference on Learning Representations ,12 2014.[38] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learningapplied to document recognition.
Proceedings of the IEEE , 86(11):2278–2324, 1998.[39] S. Schulter, C. Leistner, and H. Bischof. Fast and accurate imageupscaling with super-resolution forests. In , pages 3791–3799,2015.[40] W. Dong, L. Zhang, G. Shi, and X. Wu. Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regulariza-tion.
IEEE Transactions on Image Processing , 20(7):1838–1857, 2011.[41] Marco Bevilacqua, Aline Roumy, Christine Guillemot, and Marie lineAlberi Morel. Low-complexity single-image super-resolution based onnonnegative neighbor embedding. In
Proceedings of the British MachineVision Conference , pages 135.1–135.10, 2012.[42] Roman Zeyde, Michael Elad, and Matan Protter. On single image scale-up using sparse-representations. In
Curves and Surfaces , pages 711–730,2012.[43] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of humansegmented natural images and its application to evaluating segmentationalgorithms and measuring ecological statistics. In
Proceedings EighthIEEE International Conference on Computer Vision. ICCV 2001 , vol-ume 2, pages 416–423 vol.2, 2001.[44] J. Huang, A. Singh, and N. Ahuja. Single image super-resolution fromtransformed self-exemplars. In , pages 5197–5206, 2015.[45] Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Imagequality assessment: from error visibility to structural similarity.
IEEETransactions on Image Processing , 13(4):600–612, 2004. [46] J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. Imagenet:A large-scale hierarchical image database. In , pages 248–255, 2009.[47] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Per-ona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoftcoco: Common objects in context. In David Fleet, Tomas Pajdla, BerntSchiele, and Tinne Tuytelaars, editors,
Computer Vision – ECCV 2014 ,pages 740–755, 2014.[48] Joseph Paul Cohen, Paul Morrison, and Lan Dao. Covid-19 image datacollection. arXiv 2003.11597 , 2020.[49] R. Timofte, V. D. Smet, and L. Gool. A+: Adjusted anchored neighbor-hood regression for fast super-resolution. In
ACCV , 2014.[50] R. Timofte, V. De, and L. V. Gool. Anchored neighborhood regressionfor fast example-based super-resolution. In , pages 1920–1927, 2013.[51] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang. Deep networks forimage super-resolution with sparse prior. In , pages 370–378, 2015.[52] G. Chantas, S. N. Nikolopoulos, and I. Kompatsiaris. Heavy-tailed self-similarity modeling for single image super resolution.
IEEE Transac-tions on Image Processing , 30:838–852, 2021.[53] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identitymappings in deep residual networks. In Bastian Leibe, Jiri Matas, NicuSebe, and Max Welling, editors,