An enhanced performance for H.265/SHVC based on combined AEGBM3D filter and back-propagation neural network
1 An enhanced performance for H.265/SHVC based on combined
AEGBM3D filter and back-propagation neural network
L. Balaji
Department of ECE, Velammal Institute of Technology, Chennai, India E-mail: [email protected]
K. K. Thyagharajan
ECE Department, RMD Engineering College, Chennai, India E-mail: [email protected]
Abstract
This paper deals with the latest video coding standard H.265/ SHVC, a scalable extension to High Efficiency Video Coding (HEVC). HEVC introduces new coding tools compared to its predecessor and is backward compatible with all types of electronic gadgets. The gadgets with different display capabilities cannot be offered the same quality video due to the constraints in transmission bandwidth is a major problem. One solution to this problem will be the compression of the video sequence which is focused in this paper to preserve or increase PSNR while reducing bit-rate besides a novel method implemented in SHVC encoder. The novel method undergoes a combined AEGBM3D (adaptive edge guided block-matching and 3D) filtering and back-propagation technique. The technique includes an AEGBM3D filter which avoids spatial redundancy and de-noise frames; hence enhancement in PSNR is achieved. The obtained PSNR of the video is compared with the set threshold PSNR to maintain PSNR above the threshold by repeated AEGBM3D filtering. The BP technique based on the neural network machine learning approach continually restrains the output if the input block does not contain a feature they were trained to recognize. This frequent control over the output produces few bits; hence reduction in bit-rate is achieved. The simulation results show that the proposed technique delivers an average increment of 0.16 and 0.25dB in PSNR and an average decrement of 28 and 37% in bit-rate for × × Keywords
H.265 SHVC · Back-propagation · AEGBM3D filter · PSNR · Bit-rate · Neural network Introduction
A new video compression standard is introduced by ITUT VCEG and ISO/IEC MPEG in 2013 popularly known as H.265/High Efficiency Video Coding (HEVC). HEVC encoder surpasses its predecessor H.264/Advanced Video Coding (AVC) by higher compression, i.e., up to 45.54% in terms of bit-rate [1]. Such high compression is achieved by its sophisticated number of intra-modes and viable inter-predicting modes. Owing to this increased number of modes and its flexibility for compression, the computation complexity of HEVC is more than AVC. The scalable extension of HEVC had major attention from the industry to call upon developing a standard. Later in 2014, MPEG and ITU have launched the extended version of HEVC, known as SHVC [2]. The scalable extension uses inter-layer predicting modes along with the superior coding features of HEVC. Having noticed, HEVC is more complex, its extended version of scalable achievement is likely to be extensively more complex than its predecessor H.264/Scalable Video Coding (SVC). As a result, one major challenge in the general implementation of SHVC is reducing its computational complexity. The computational complexity in SHVC increases due to one of the major factor is mode search from the available number of modes with minimum rate-distortion cost. In addition to mode search, more computation part includes partitioning of its Coding Tree Unit (CTU). Although many algorithms were proposed for its predecessor H.264/SVC, to reduce the computational complexity, none of the algorithms will be suitable for SHVC. Since SHVC uses several modes and flexible inter-prediction modes, the algorithms called for reducing the computation complexity does not exactly suitable for SHVC.
2 Every frame in HEVC is partitioned into square-shaped equal-sized blocks. The fundamental block is the code tree block (CTB). For every CTB of luminance, two CTBs of related chrominance samples do exist. The entire subset of one luminance CTB, two chrominance CTB and their associated rule among these blocks are known as coding tree unit (CTU). Unlike H.264/AVC, HEVC employs a quad-tree-based coding structure that can support various sizes of a block. To enhance more compression, every CTU can split into squares of smaller blocks known as coding units (CUs). Like the CTU, every CU block contains one block of luminance and two blocks of chrominance samples and their associated rule needed for transformation. To achieve improved coding efficiency, every CU is split into 4 small CUs. The procedure of splitting CTU into CUs is lengthened for F iterations. The F iterations denote the highest CU depth in the CTU quad-tree structure. Afterward, the CUs can be split into small prediction units (PUs) and transform units (TUs) [3–5]. HEVC offers intra-prediction modes more in number to minimize spatial redundancy and more elastic motion compensation to minimize temporal redundancy unlike H.264/AVC [1]. To compensate spatial redundancy, HEVC holds 35 luminance intra-prediction modes while 9 Intra modes in H.264/AVC. Furthermore, intra-prediction planned withvarioussizesofblocksrangefrom4 × × × ×
16. The intra-prediction coding procedure, in HEVC grants 35 intra-modes altogether DC and Planar modes for the luminance component of every PU. The size of the PU [1] determines the maximum number of modes to be predicted as likely mode using the rate-distortion cost in HEVC. Apart from intra-modes, the encoder decides among motion merge mode, skip mode and explicit encoding of motion information. For every PU block to be coded, the motion merge mode occupies generating a record of formerly coded spatial and temporal adjacent PUs recognized as candidates. The motion data for the present PU is duplicated from a chosen candidate, rather encoding a motion vector for the PU; as a replacement, only the index of a candidate in the motion merge record is encoded in addition to the residual. In skip mode, the encoder indicates the index of a motion merge candidate and the motion parameters for the present PU are duplicated from the chosen candidate, not including residual information to be sent. In explicit encoding mode, Symmetric and Asymmetric Motion Partitions (AMP) are done in inter-coded CUs. In AMPs, the CUs are partitioned into several small PUs of non squared shape size. AMPs can be done for CUs size from 64 ×
64 downward to 16 ×
16, succeeding in coding efficiency. Since, PUs depicts the conventional shape of objects, to be more accurate, which we do not require to split further [1]. Every inter-predicted PU contains a record of motion parameters, which includes a motion vector, a reference frame index and a reference list flag. The scalable version of HEVC (SHVC) exploits the sophisticated prediction methods like HEVC. To get better coding efficiency [6], SHVC incorporates inter-layer motion and inter-layer texture prediction. In inter-layer motion prediction, the motion data (motion vector and index of reference frame) of the co-located CU in the base layer are appended to the merge candidate list [7] adding together the adjacent PUs presently used in HEVC. In inter-layer texture prediction, a reconstructed signal of a related CU in the base layer is interpolated. Followed by, a CU in the enhancement layer will be predicted based on the resulting signal. Here, interlayer reference frames are used as reference frames adding with temporal reference frames [2]. In the sight of the above, we prepared an attempt to enhance the performance of SHVC. It is imposed by a hybrid image blocking CAFBP scheme for PSNR improvement while compressing bits. AEGBM3D filter applied to avoid spatial redundancies and frame de-noising; hence improvement in PSNR is achieved. BP technique rooted from machine learning neural network is integrated to restrain the error signal, which reduces bit-rate. The rest of the paper is organized as follows: Sect. 2 explains the work related to SHVC, Sect. 3 explains the CAFBP scheme, the simulation results are discussed in Sect. 4, and finally, Sect. 5 concludes the scheme. Related work
Lately, numerous works related to SHVC focused on low complexity computation, fast mode decision, rate control techniques, etc. Many computation complexity reduction methods have been already discussed for HEVC [7–18]. In [7] a 3 fast mode search algorithm was implemented for intra-prediction. A CTU splitting and prune methods are implemented near the beginning for intra-prediction [8]. Having considered, inter-prediction is a more complex section in HEVC with low delay and random access profile, different methods discussed to reduce the complexity in inter-prediction [9–16]. Merge mode identification is recommended near the beginning which utilizes the root block mode, all-zero block mode and the motion estimation is discussed in [12]. With the use of motion deviation, the size of the coding unit (CU) can be selected quickly is suggested [14]. A successful CU size detection scheme is discussed in [15] which undergo two strategies to reduce the computation. One strategy is estimating the quad-tree depth range. The second strategy minimizes the computation of motion estimation for all small block sizes. From the time when SHVC utilizes the superior coding features of HEVC, the discussed computation complexity reduction schemes for HEVC can be applied for the computation complexity reduction of both the layers (base and enhancement layer). Lately, many computation complexities minimizing techniques have been practiced for SHVC encoder [19–25]. A mode search scheme near the beginning termination [20] and a dynamic search range scheme is discussed for quality scalability [19]. The mode search scheme uses the rate-distortion cost values of adjacent blocks to expect the rate-distortion cost of the block to be coded in the higher layer (enhancement layer).Thedynamicsearchrangeschemeuses the motion information of the lower layer (base layer) to dynamically alter the search range in the higher layer (enhancement layer). A fast mode method based on Bayesian classifier for quality scalability is discussed [21], which exploits adjacent blocks mode and related block in the base layer to expect the best likely mode for the present block in the enhancement layer. While a dynamic scheme is discussed for spatial scalability which expects the search range data for the enhancement layer blocks exploiting the motion information of related blocksinbaselayer[22]. Few works,[19–22]discussedatthe early start of SHVC performance validated based on HEVC reference encoder. Now, we intend to build up a computation complexity minimization scheme particularly planned for quality and spatial scalable versions of HEVC. With the proposed method the encoder complexity is minimized by avoiding checking all the CU sizes for the CTU partition. [26]suggested a technique to empower differential coding in an adaptable codec plan without influencing the interior coding, therefore enabling a functional execution to reclaim single-layer equipment or programming segments. This is accomplished by making an extra reference picture called improved between layer reference and embeddings it to the enhancement layer (EL) decoded picture buffer and reference picture sets. Building up a plan for foreseeing the CTU structure for the scalable extension of HEVC in the spatial and quality scalability is focused [27]. The algorithm utilizes the CTU structure of the previously coded CTUs in the higher layers (EL) and lower layer (BL) to anticipate the size of coding units of the CTU blocks to-be-coded in the higher layer. Using two higher layers (EL) a reduced complexity method is proposed for SNR adaptive extension of HEVC [28]. The proposed method possibly can make encoding or broadcasting of a few diverse quality adaptations of similar video information in bit-stream an alluring recommendation for the information conveyance industry, taking into account financially savvy digital media conveyance to an assortment of playback show gadgets. So far, all proposed methods may reduce the complexity of the encoder and by saving encoding time, but the expansion in information brought by several administrations requires more compressed data and transmits over networks. Besides, conveying such video information over systems needs a precise control of the bit-rate from coders to coordinate unbending limitations on transmission capacity and QoS. A few commitments have been proposed to together encode adaptable stream, yet without considering the effect of bitrate proportion among layers with compressed execution. In [29], the effect of the bit-rate proportion among layers on the encoder execution is first explored for a few UHD scalable methods proposed. To our extent, still, no method proposed for the scalable extension of HEVC which can enhance PSNR with lesser bits. The proposed hybrid block matching technique using AEGBM3D and BP technique which improves PSNR is described in Sect. 3, the subsection describes the impact of AEGBM3D filter, and the BP technique based on neural networks is explained in detail. Combined AEGBM3D filter and BP scheme
Inthisproposedwork,acombinedAEGBM3DfilterwithBP technique-based machine learning neural network technique is implemented. The purpose of AEGBM3D filter is to avoid redundancies among the frame spatially, and besides, it can de-noise the frames; hence an improvement in PSNR is achieved. The BP technique based on the neural network machine learning approach restrains the error signal, to reduce bit-rate. In the following section, a detailed description of the AEGBM3D filter, BP technique and CAFBP scheme is presented.
AEGBM3D filter
AEGBM3D is a non-local image modeling based Fig. 1 on adaptive high-order group-wise models. A video sequence is split into frames, frames into macroblocks further split into blocks. The blocks are processed with a denoising algorithm to remove the redundant information and noise in the frame. One such video de-noising algorithm [31] executes grouping and filtering which removes redundancy while maintaining the original signal. This grouping and filtering are recalled in AEGBM3Dfiltersoastoreducethecomputationcomplexity by avoiding redundancy and for de-noising.
Fig. 1
Machine learning model
Groupingistheprocesstoidentifythesimilarblocks.Similar 2D blocks are piled mutually to form 3D group of similar items is called as a group [30]. The filter process the blocks to de-noise the frames by executing a predictive search block matching method [31] along with two passes, respectively; the first pass undergoes block matching with hard thresholding, while the second pass undergoes block matching with wiener filtering. For every block grouping and filtering is performed to de-noise the frames of a video sequence. The grouping is accomplished by a predictive search block matching method that finds similar blocks among the reference and adjacent blocks,andtheirresemblanceismeasuredin3D field that extents both temporal and two spatial dimensions. By implementing this method the computation complexity is significantly reduced when compared to the full search method. Next, apply 3D transform on the group to generate a vastly sparse depiction of the original signal. The filtering (either hard thresholding or Weiner) is applied to the transform coefficients for reduction as 5 well as noise removal while maintaining the important part of the original signal. Finally, the estimates of all groups are obtained by applying the inverse 3D transform. A set of overlapped blockwise estimates are obtained for every block after executing grouping and filtering. The estimates are collected and are then combined by the weighted averaging process in which the weights are inversely proportional to the reduction in group spectrum during filtration and therefore freely reciprocal to the entire variance of every group. Thus, two passes are applied to de-noise the frames in the video, first-pass perform grouping and hard thresholding to generate a fundamental estimate (intermediary), followed by the second pass which performs grouping and wiener filtering which takes the spectrum from the fundamental estimates. This joint 3-D transform considerably improves the effectiveness of spectral image approximation.
Back-propagation technique
Figure 1 shows the generic model of a neural network, which contains input, hidden and output layers. The output layers are compared with the desired outputs so that a small error signal is expected with the training sets as shown in Fig. 2. One neural network that is useful in addressing such problems is the feed-forward network.
Fig. 2
Training sets with desired outputs (A)
One of the set of p training input patterns is applied to the input layer. x p = ( x p1 , x p2 ,... x pn ) which may be a binary or real numbered vector. (B) The initiations of units in the hidden layer are calculated by taking their net input (the sum of the initiation of the input layer units they are connected to accumulate by their respective connection weights) and passing it through a transfer function.
6 (i)
Summation of all inputs to the hidden layer unit j to give net input of j Squash using the activation function All input passes through j function. (C) The initiation of the hidden layer units computed in (B) is then used in the initiation of the output units. Here all output of j is made as input to k while passing all output through the same transfer function. (i) The net input to output unit k (ii) Result of output unit k (it is also calculated from the net input of k that will pass through the same transfer function) Backward pass (A) The difference between the actual initiation of each output unit and the desired target initiation ( d k ) for that unit is found, and this difference is used to generate an error signal for each output unit. A quantity called delta is then calculated for all output units. (i) Error signal of actual output oo k , and its desired output d k ( d k − oo k ) (5) (ii) Delta term which is a partial derivative of the nonlinear activation function for each output unit is equal to its error signal multiplied by the output of that unit multiplied by(1—its output) as δ ok = ( d k − oo k ) oo k ( − oo k ) (6) (B) The error signals for the hidden layer units are then calculated by taking the sum of the deltas of the output units. A particular hidden unit connects is multiplied by the weight that connects the hidden and output unit. The deltas for each of the hidden layer units are then calculated. (i) Error signal for each hidden unit j.
7 (ii) Delta term which is a partial derivative of nonlinear activation function for each hidden unit j is equal to its error signal multiplied by its output, multiplied by (1—its output) as W δ h j oh j o k W kj (8) k = (C) The weight error derivatives (WED) for every weight between the hidden and output units are ascertained byt aking the delta of every output unit and multiplying it by the initiation of the concealed unit it interfaces with. These weight error derivatives are then used to change the weights between the hidden and output layers. wed jk = δ ok ( oh j ) (9) i.e., to calculate the weight error derivative between hidden unit j and yield unit k take the delta term of output unit k and increase it by the output (initiation) of hidden unit j . (D) The weight error derivatives for every weight between the input unit j and shrouded unit j are computed by taking the delta of each hidden unit and multiplying it by the initiation of the input unit it interfaces with (i.e., that input pattern x i ) . These weight error derivatives are then used to change the weights between the input and hidden layers. wed ij = δ h j ( x i ) (10) To change the actual weights themselves, a learning rate parameter n is used, which controls the sum of weights that areupdatedduringeveryBPcycle.Theweightsatonce ( t + ) between the hidden up and output layers are set using the weights at once and the weight error derivatives between the hidden and output layers using the following equation. w jk ( t + ) = w jk ( t ) + η( wed jk ) (11) Inasimilarwaytheweightsarechangedbetweentheinput and hidden units w ij ( t + ) = w ij ( t ) + η( wed ij ) (12) Using this method, every unit in the network receives an error signal that describes its relative contribution to the total error between the actual output and the target output. Based on the error signal received, the weights connecting the units in different layers are updated. These two passes are repeated many times for different input patterns and their targets until the error between the actual output of the network and its objective yield is acceptably small for all of the members of the set of training inputs. This type of training can be applied to much larger networks than the XOR network to solve much more complex problems, yet the fundamental two-pass cycle continues the same. As the network trains, units in the hidden layer organize themselves such that different units learn to recognize different features of the total input space. For example, if a network were trained to react to a pixel image of the letter ‘T’, one unit may create as an element finder for the vertical bar on the
8 highest point of the ‘T’. After training, when given a subjective new input pattern that is noisy or incomplete, the units in the hidden layer will react with an active output if the new input contains that resembles the feature of the individual units learned to recognize during training. On the other hand, hidden layer units tend to restrain their outputs if the input pattern does not contain a feature they were trained to recognize. These networks tend to create inward internal relationships between units to arrange the training data into classes of patterns. In this way, they develop an inner representation that enables them to the desired outputs when given the training inputs. This same interior representation can be applied to inputs that were not used during training. The BP network will group these new inputs as per the features they share with the training inputs, i.e., these networks can sum up. The main steps are as follows
Fig. 3
Proposed block of SHVC encoder using AEGBM3D filter Initialize weights to small random values. 2.
Select a training vector pair (input and the corresponding output) from the training set and present the input vector to the inputs of the network. 3.
Calculate the actual outputs (forward pass). 4.
According to the difference between actual and desired outputs (error signal). Adjust the weights W o and W h to reduce the difference (backward pass). 5. Repeat from step 2 for all training vectors. 6.
Repeat from step 2 until the error is acceptably small.
CAFBP scheme
The proposed model of HEVC encoder architecture is shown in Fig. 3. The architecture obtains YUV video sequence which is converted into frames. The Frames are being converted into pixel blocks of variable sizes. Each block contains the extracted Luminance and Chrominance components of a frame and as per the variance of the extracted feature, the frameundergoblocksizesof8 × × × ×
64 blocks whichever is suitable. Then apply AEGBM3D filter which is specially designed for spatial predictive filtering techniques. It identifies and controls spatial redundancies that exist among frames by coding some original blocks via spatial prediction and other coding techniques. Check PSNR of a frame whether 9 falls below the set threshold (highest PSNR among all frames in a video sequence). If PSNR is lower than the set threshold, apply the filtering technique until it upholds the PSNR above or equal to the set threshold. Followed by, the BP technique is applied to restrain the error signal, such that it reduces bit-rate, by continually controlling their outputs using SHVC encoding techniques. Exploiting temporal relations that exist among blocks and subsequent frames, the changes between frames are encoded. This is accomplished via motion estimation and compensation where searches are performed on adjacent frames to form motion vectors that predict qualities of the target block. Identifyandtakeadvantageofanyremainingspatialredundancies that exist among frames by encoding only the deviations between original and predicted blocks through quantizing, transforming, and entropy coding. The steps involved in the proposed technique as follows 1.
Extract all frames in a video sequence 2.
Based on the variance of extracted feature from each frame, split into blocks of 8 ×
8, 16 ×
16, 32 ×
32 and 64 ×
64. 3.
Apply AEGBM3D filter to avoid spatial redundancy and-noise the frame. 4.
Check PSNR of a frame with the set threshold PSNR (extracted as highest PSNR among all frames in a video) 5.
If PSNR falls above the threshold go to step 7 else go to step 6. 6.
Apply filtering technique until it upholds PSNR above or equal to the set threshold, followed by, the BP technique restrains the error signal such that it reduces bit-rate. 7.
Encode the frame using HEVC encoding techniques (quantization, transformation and entropy encoding). 8.
Go to step 2. Experimental results
The performance of the proposed scheme is evaluated with SHM reference software 12.1 [32]. In the implementation process, we tested 4 standard database video sequences (Kimono, Park scene, Duckstakeoff and Traffic) and 2 realtime video sequences (gate and home) are shown in Fig. 4 for quality and spatial scalability. We used the base layer and one enhancement layer for these two scalabilities. As per the SHM test conditions, spatial scalability is conducted for two ratios( × × × ×
8. If the image has a higher resolution, higher-level blocking will be selected. Otherwise, it starts from 8 ×
8. Every frame undergoes AEGBM3D filter. Then determine the PSNR of each frame. Select the highest PSNR among all frames in the entire video sequence. The highest PSNR is set as a threshold PSNR for the input video. Until the PSNR of input video reaches the threshold PSNR, the blocking of images will proceed from 8 ×
8 to 64 ×
64 which are controlled by the BP technique.
Fig.4
Standardvideosequences. a Kimono, b Parkscene, c Duckstakeoff and d Traffic Fig. 5
Real-time video sequences. a Gate and b home In our test, we compare random access quality (RA (Q)), random access spatial (RA(S)), low-delay P (LDP) and low delay B (LDB) profiles of SHVC. For Quality scalability, we set the quantization parameters of the base layer and enhancement layer as (26, 22), (30, 26), (34, 30) and (38, 34). In Spatial scalability, the quantization parameters are set as (22, 22),(26,26),(30,30)and(34,34) for base layer and enhancement layer in both spatial ratios ( × × ×
2) alone conducted. Table1illustratesthecombinedscalabilityunderfourprofiles such as RA (Q), RA (S), LDP and LDB with an average increase of 0.16 dB in PSNR and an average decrease of 28% in bit-rate under 1.5 × spatial resolution. In other cases, an averageincreaseof0.25dBinPSNRandanaveragedecrease of 0.37% in bit-rate under 2 × spatial resolution in comparison with the previously proposed method. Figures 6, 7, 8 and 9 show the rate-distortion (RD) curves for different video sequences under SNR and spatial scalabilities. The RD curve plots the data between bit-rate versus PSNR of each sequence in all configurations such as random access—SNR, × ×
2; low-delay P— × ×
2; and low-delay B— × ×
2. It is also compared with the standard SHM 12.1 reference encoder for SHVC [32] and HM 6.1 reference encoder for HEVC single-layer coding along with the previously proposed method [27]. Although, single-layer HEVC coder outperforms scalable-level encoding techniques in terms of bit-rate and PSNR in some video sequence, when it comes to scalability, single-layer coding cannot provide such flexibility. Comparing our proposed scheme with the existing methods, the proposed scheme does better in terms of PSNR and bit-rate, except a slight increase in encoding time.ThisencodingtimeisduetotheBPscheme which repeats to maintain the error signal to the minimum and achieves to reduce the number of bits. Conclusion
This paper provides an overview of the latest scalable coding standard SHVC. SHVC adopts a scalable coding architecture; with the machine learning approach of BP is implemented. Efficient BP is achieved by inter-layer reference picture processing modules. In contrast to the previous scalable coding standard SVC, the EL codec in SHVC can be built by repurposing existing single-layer HEVC codec cores. We compare the random access (RA), low-delay P (LDP) and low delay B (LDB) profiles of H.265/SHVC. For Quality scalability, we use the quantization parameters of the base layer and enhancement layer as (26, 22), (30, 26), (34, 30) and (38, 34). In Spatial scalability, the quantization parameters are (22, 22), (26,26),(30,30)and(34,34) for base layer and enhancement layer in both spatial ratios ( × × ×
2) alone conducted. In all cases by examining we conclude that the combination of AEGBM3D and BP gives an average increment of 0.16 dB and 0.25 dB in PSNR and an average decrement of 28 and 37%inbit-ratefor×1.5and×2spatialratios,respectively.In our future 11 work, we concentrate on the adaptable PSNR threshold selection and it works as per the resolution of the video sequence, the PSNR will be selected.
Table 1
Quality and spatial scalability in all 4 configurations [RA (Q), RA (S), LDP and LDB]
YUV Y U V YUV Y U V RA (Q) BL − − . . − EL − − . − . − RA (S) − − . . − − − . . − LDP − − . . − − − . − . − LDB − − . − . − − . . − Average − − . . − − − . − . − Fig. 6
RD curve for RA × Profile
Layer [ ] versusproposed [ ] versusproposed PSNR(dB) BR % PSNR(dB) BR % Fig. 7
RD curve for LDP × Fig. 8
RD curve for RA SNR Kimono sequence Fig. 9
RD curve for LDB × References Pourazad,M.T.,Doutre,C.,Azimi,M.,Nasiopoulos,P.:HEVC:the new gold standard for video compression. IEEE Consum. Electron. Mag. 1(3), 36–46 (2012) 2.
Test Model for Scalable Extensions of High Efficiency Video Coding (HEVC). In: ISO/IEC and JTC1/SC29/WG11 (2013) 3. Sze, V., Budagavi, M., Madhukar, S., Sullivan, G.: High EfficiencyVideo Coding (HEVC): Algorithms and Architectures, Integrated Circuits and Systems. Springer, New York (2014) 4. Wien, M.: High Efficiency Video Coding: Coding Tools and Specification. Signals and Communication Technology. Springer, Berlin (2014) 5. Sullivan, G.J., Ohm, J., Woo-Jin, J.H., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012) 6. Schwarz, A.H., Marpe, D., Wiegand, T.: Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Trans. Circuits Syst. Video Technol. 17(9), 1103–1120 (2007) 7. Zhang, H., Ma, Z.: Fast intra mode decision for high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 24(4), 660–668 (2014) 8. Cho, S., Kim, M.: Fast CU splitting and pruning for sub optimalCU partitioning in HEVC intra coding. IEEE Trans. Circuits Syst. Video Technol. 23(9), 1555–1564 (2013) 9. Shen, L., Liu, Z., Zhang, X., Zhao, W., Zhang, Z.: An effective CUsize decision method for HEVC encoders. IEEE Trans. Multimed. 15(2), 465–470 (2013) 10. Xu, Y., Li, Q., Chen, J., Zhao, T.: Adaptive search range control in H.265/HEVC with error propagation resilience and hierarchical adjustment. SIViP 11, 1559–1566 (2017) 11. Li, W., Zhao, F., Zhang, E., Ren, P.: Lagrange optimization in high efficiency video coding for SATD-based intra-mode decision. SIViP 11, 1163–1170 (2017)
14 12. Pan, Z., Kwong, S., Sun, M.T., Lei, J.: Early MERGE mode decision based on motion estimation and hierarchical depth correlation for HEVC. IEEE Trans. Broadcast. 60(2), 405–412 (2014) 13. Hu, N., Yang, E.H.: Fast motion estimation based on confidence interval. IEEE Trans. Circuits Syst. Video Technol. 24(8), 1310– 1322 (2014) cessing modules. In contrast to the previous scalable coding 14. Xiong, J., Li, H., Wu, Q., Meng, F.: A fast HEVC inter CU selection method based on pyramid motion divergence. IEEE Trans. Posing existing single-layerHEVCcodeccores. We compare Multimed. 16(2), 559–564 (2014) 15. Shen, L., Liu, Z., Zhang, X., Zhaoyang, Z.: An effective decision method for HEVC encoders. IEEE Trans. Multimed. 15(2), 465– 470 (2013) 16. Shen, L., Zhang, Z., Liu, Z.: Adaptive inter-mode decision forHEVC jointly utilizing inter-level and spatiotemporal correlations. IEEE Trans. Circuits Syst. Video Technol. 24(10), 1709–1722 (2014) 17. Shen, L., Zhang, Z., An, P.: Fast CU size decision and mode decision algorithm for HEVC intra coding. IEEE Trans. Consum. Electron. 59(1), 207–213 (2013) 18. Shen, L., Zhang, Z., Liu, Z.: Effective CU size decision forHEVC intracoding. IEEE Trans. Image Process. 23(10), 4232– 4241 (2014) 19. Tohidypour, H.R., Pourazad, M.T., Nasiopoulos, P.: Content adaptive complexity reduction scheme for quality/fidelity scalable HEVC. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (2013) 20. Content Adaptive Complexity Reduction Scheme for Quality/Fidelity Scalable HEVC. In: ISO/IEC and JTC1/SC29/WG11, L0042 (2013) 21. Tohidypour, H.R., Pourazad, M.T., Bashashati, H., Nasiopoulos,P.: Fast mode assignment for quality scalable extension of the high efficiency video coding(HEVC)standard:a Bayesian approach.In: Proceedings of the 6th Balkan Conference in Informatics (2013) 22. Tohidypour, H.R., Pourazad, M.T., Nasiopoulos, P.: Adaptivesearch range method for spatial scalable HEVC. In: Proceedings of the IEEE International Conference on Consumer Electronics (2014) 23. Zuo, X., Yu, L.: Fast mode decision method for all intra spatial scalability in SHVC. In: Proceedings of the IEEE Visual Communications and Image Processing (2014) 24. Huang, D.S., Bevilacqua, V., Premaratne, P. (eds.): IntelligentComputing Theory. Fast Mode and Depth Decision Algorithm for Intra Prediction of Quality SHVC. Springer, Cham (2014) 25. Bailleul, R., De Cock, J., Van de Walle, R.: Fast mode decision for SNR scalability in SHVC digest of technical papers. In: Proceedings of the IEEE International Conference on Consumer Electronics (2014) 26. Aminlou, A., Lainema, J., Ugur, K., Hannuksela, M.M., Gabbouj,M.: Differential coding using enhanced inter-layer reference picture for the scalable extension of H.265/HEVC video codec. IEEE Trans. Circuits Syst. Video Technol. 24(11), 1945–1956 (2014) 27. Tohidypour, H.R., Pourazad, M.T., Nasiopoulos, P.: Probabilisticapproach for predicting the size of coding units in the quad-tree structure of the quality and spatial scalable HEVC. IEEE Trans. Multimed. 18(2), 182–195 (2016) 28. Tohidypour, H.R., Pourazad, M.T., Nasiopoulos, P.: An encoder complexity reduction scheme for quality/fidelity scalable HEVC. IEEE Trans. Broadcast. 62(3), 664–674 (2016) 29. Biatek, T., Hamidouche, W., Travers, J.F., Deforges, O.: Optimalbitrate allocation in the scalable HEVC extension for the deployment of UHD services. IEEE Trans. Broadcast. 62(4), 826–841 (2016) 30. Dabov, K., Foi, F., Katkovnik, V., Egiazarian, K.: Image denoisingbysparse3Dtransform-domain collaborative filtering.IEEETrans. Image Process. 16(8), 2080–2095 (2007) 31. Dabov, K., Foi, A., Egiazarian, K.: Video denoising by sparse 3Dtransform-domain collaborative filtering. In: Proceedings of the 15th European Signal Processing Conference (2007) 32. SHM Software https://hevc.hhi.fraunhofer.de/svn/svn_ SHVCSoftware/tags/SHM-12.1/. Accessed 03/01/2017 https://doi.org/10.1007/s11760-018-1265-1 Signal, Image and Video Processing (2018) 12:809 –817