[PDF] A Generalized Rate-Distortion- λ Model Based HEVC Rate Control Algorithm

Abstract

The High Efficiency Video Coding (HEVC/H.265) standard doubles the compression efficiency of the widely used H.264/AVC standard. For practical applications, rate control (RC) algorithms for HEVC need to be developed. Based on the R-Q, R- ρ or R- λ models, rate control algorithms aim at encoding a video clip/segment to a target bit rate accurately with high video quality after compression. Among the various models used by HEVC rate control algorithms, the R- λ model performs the best in both coding efficiency and rate control accuracy. However, compared with encoding with a fixed quantization parameter (QP), even the best rate control algorithm [1] still under-performs when comparing the video quality achieved at identical average bit rates. In this paper, we propose a novel generalized rate-distortion- λ (R-D- λ ) model for the relationship between rate (R), distortion (D) and the Lagrangian multiplier ( λ ) in rate-distortion (RD) optimized encoding. In addition to the well designed hierarchical initialization and coefficient update scheme, a new model based rate allocation scheme composed of amortization, smooth window and consistency control is proposed for a better rate allocation. Experimental results implementing the proposed algorithm in the HEVC reference software HM-16.9 show that the proposed rate control algorithm is able to achieve an average of BDBR saving of 6.09%, 3.15% and 4.03% for random access (RA), low delay P (LDP) and low delay B (LDB) configurations respectively as compared with the R- λ model based RC algorithm [1] implemented in HM. The proposed algorithm also outperforms the state-of-the-art algorithms, while rate control accuracy and encoding speed are hardly impacted.

Full PDF

aa r X i v : . [ c s . MM ] N ov A Generalized Rate-Distortion- λ Model BasedHEVC Rate Control Algorithm

Minhao Tang , Jiangtao Wen , Yuxing Han Tsinghua University, Beijing, China South China Agricultural University, Guangzhou, [email protected] [email protected]

Abstract —The High Efﬁciency Video Coding (HEVC/H.265)standard doubles the compression efﬁciency of the widely usedH.264/AVC standard. For practical applications, rate control(RC) algorithms for HEVC need to be developed. Based on the R-Q, R- ρ or R- λ models, rate control algorithms aim at encoding avideo clip/segment to a target bit rate accurately with high videoquality after compression. Among the various models used byHEVC rate control algorithms, the R- λ model performs the bestin both coding efﬁciency and rate control accuracy. However,compared with encoding with a ﬁxed quantization parameter(QP), even the best rate control algorithm [1] still under-performswhen comparing the video quality achieved at identical averagebit rates.In this paper, we propose a novel generalized rate-distortion- λ (R-D- λ ) model for the relationship between rate (R), distortion(D) and the Lagrangian multiplier ( λ ) in rate-distortion (RD)optimized encoding. In addition to the well designed hierarchicalinitialization and coefﬁcient update scheme, a new model basedrate allocation scheme composed of amortization, smooth windowand consistency control is proposed for a better rate allocation.Experimental results implementing the proposed algorithm inthe HEVC reference software HM-16.9 show that the proposedrate control algorithm is able to achieve an average of BDBRsaving of 6.09%, 3.15% and 4.03% for random access (RA), lowdelay P (LDP) and low delay B (LDB) conﬁgurations respectivelyas compared with the R- λ model based RC algorithm [1]implemented in HM. The proposed algorithm also outperformsthe state-of-the-art algorithms, while rate control accuracy andencoding speed are hardly impacted. Index Terms —HEVC, Rate Control, ABR, R-D- λ Model

I. I

NTRODUCTION

HEVC [2] is the latest video compression standard fromITU and MPEG as the successor to H.264/AVC [3]. It hasbeen widely observed that HEVC can save 50% of the bits onaverage as compared to H.264/AVC while achieving the samevisual quality, at a cost of much higher encoding complexity[4]. Even though video can be encoded using the constantquantization parameter (CQP) mode, also known as the non-RC mode, in practical applications, rate control is morecommonly used to encode an input video to a target bit ratefor bandwidth constrained applications while achieving goodvideo quality after compression.Rate control can be generally categorized into two types,constant bit rate (CBR) control and average bit rate (ABR)control. ABR sets a target average bit rate for the entire video

This work is supported by Boyan Information Technology Ltd and NaturalScience Foundation of China (Project Number 61521002). or every single video segment while allowing the bit rateto vary among different parts of the video according to thecomplexity of those parts. On the other side, CBR requires astrictly uniform output bit rate for every time period, typicallyone second. At a given bit rate, ABR usually provides a higherquality after compression than CBR, as CBR sacriﬁces a lotin coding efﬁciency for constant bit rates over time. ABR isusually used in coding efﬁciency oriented applications likevideo on demand and video storage, while CBR is often usedin jitter-sensitive applications like video call and satellite basedvideo communication. This paper focuses on proposing a novelABR algorithm to improve the coding efﬁciency of HEVC, soonly ABR is discussed and tested in this paper.In general, rate control algorithms need to achieve a highbit rate accuracy as measured by bit rate error, while achievinggood video coding efﬁciency, which is generally measured byBDBR [5]. On a high level, rate control consists of two steps,1) allocating the target bit rate across and inside frames ofthe input sequence,2) selecting proper coding parameters to meet the target bitrate with good video quality.Existing rate control algorithms for HEVC typically useone of three rate estimation models, namely the R-Q model[6], [7], the R- ρ model [8], [9], and the R- λ model [10], [1].These models were designed to predict the output bit rate Rafter compression using features such as the quantization Q in the R-Q model, the percentage of zero-valued transformedcoefﬁcients ρ in the R- ρ model and the Lagrangian multiplier λ in the R- λ model.Experiments show that R- λ model based rate control al-gorithms [10], [1] signiﬁcantly outperform the R-Q modeland R- ρ model based algorithms. The ﬁrst R- λ model basedrate control algorithm [10] is 15% better in coding efﬁciencythan the previous state-of-the-art R-Q model based rate controlalgorithm [7] with a nearly halved bit rate error. As a result,R- λ model based algorithms [10], [1] have been adoptedand integrated in the HEVC reference software. However, asmentioned in [10], [1], the coding efﬁciency of the current R- λ model based rate control algorithm is still much lower thanthe CQP mode. In addition, Wen et al [11] pointed out thatthe current R- λ model based rate control algorithm might failwhen meeting scene changes.To improve the performance of R- λ model based ratecontrol, many algorithms [12], [13], [14], [15], [16], [11], [17]have been proposed to improve rate allocation and/or model coefﬁcients update mechanisms. These algorithms achieve ahigher coding efﬁciency than the original R- λ model basedRC algorithm proposed in [10], [1]. However, those algorithmsmainly focus on improving the rate allocation and modelcoefﬁcients update based on the R- λ model without furtherimproving the R- λ model.In this paper, we propose a novel generalized rate-distortion- λ model to better model the relationship between rate, distor-tion and λ . The proposed algorithm improves the accuracyof model ﬁtting by 56% over R- λ model. Besides the newmodel, the well designed hierarchical initialization and themodel coefﬁcients update scheme, a new model based rateallocation scheme is proposed for a better rate allocation.The rate allocation module includes amortization for I frame,smooth window based compensation and consistency controlon QP value selection. Experimental results implementing theproposed algorithm in the HEVC reference software HM-16.9 show that the proposed rate control algorithm is able toproduce average BDBR savings of 6.09%, 3.15% and 4.03%for the Random Access (RA), Low Delay P (LDP) and LowDelay B (LDB) conﬁgurations respectively as compared withthe R- λ model based RC algorithm in [1], i.e. the default RCalgorithm in HM. The proposed algorithm also outperforms thestate-of-the-art rate control algorithms, while the rate controlaccuracy and encoding speed are hardly impacted.The remainder of this paper is organized as follows. Sec. IIreviews the research on HEVC rate control. Sec. III describesthe proposed algorithm in detail. Sec. IV presents experimentalresults. Section V concludes the paper.II. R ELATED W ORK

A. Rate Control Models

In HEVC encoding, the quantization parameter (QP) and theLagrangian multiplier λ for rate-distortion optimization (RDO)are two important parameters that directly determine outputvideo quality and bit rate after compression. QP decides thequantization step that is used to quantize the residual aftertransform, which determines the distortion of each predictingmode as well as the residual after quantization. λ , as a functionof QP, deﬁnes the following RDO target function in [18], J = min ( D + λ · R ) , (1)where J is also known as the RD cost. It has been widelyagreed that the following logarithmic relationship [19] betweenQP and λ can achieve the best encoding efﬁciency statistically. QP = c × ln ( λ ) + c , (2)where c and c are variables related to the video characteris-tics and compression performance. Based on this relationship,rate control algorithms only need to decide either QP or λ . Then various models are used to estimate the codingparameters according to the target bit rate.The R-Q model assumes that the quantization step Q hasa direct correspondence to the coding complexity and can accurately estimate the number of bits consumed using thefollowing quadratic model [6], R = aQ − + bQ − , (3) QP = 12 . . × log ( Q/ . , (4)where a and b are two parameters related to video content thatare updated as encoding proceeds. Choi et al [7] proposed apixel based uniﬁed R-Q model based rate control algorithm forHEVC, which was adopted and implemented in HEVC refer-ence software HM-8.0. As discussed, Q can only determinethe distortion and residual of each predicting mode. However, λ decides which prediction mode to use, while the outputbit rate of the current prediction is determined by CABAC.Therefore, the indirect relationship between R and Q cannotachieve a high accuracy in rate estimation. In addition, thenon-monotonic quadratic model in R-Q model is hard to beaccurately updated during the encoding process, which wouldalso reduce the coding efﬁciency.Another class of rate control algorithms, namely ρ domainrate control [20], [8], [9], assumes a linear relationship, shownin Equation (5), between the output bit rate and the percentageof zeros among the quantized transform coefﬁcients, denotedas ρ . R = θ · (1 − ρ ) . (5) ρ domain rate control algorithms were popular for H.264.However, HEVC standard introduces a ﬂexible quad-tree cod-ing unit (CU) partition scheme and the skip prediction method,leading to signiﬁcant difference in the distribution of zerosafter transform and quantization. In addition, the extra bitsconsumed by newly introduced syntaxes also make the linearrelationship between ρ and output bit rate less accurate ascompared with H.264. As a result, ρ domain rate control israrely used in HEVC.As the state of the art, Li et al [10] proposed a novel R- λ model for HEVC, where the relationship between distortionand rate is modelled as a hyperbolic function in Equation (6).Accordingly, the relationship between bit per pixel (bpp) and λ is also hyperbolic after unit conversion, as given in Equation(7). D = CR − K , (6) λ = αbpp β , (7)where C , K , α and β are content related parameters. Only α and β are needed to be estimated and updated during encoding.Similar to the hierarchical structure used in HEVC, λ domainrate control algorithms also propose its own hierarchy, whereframes in the same frame-reference hierarchy share a same setof model coefﬁcients. During encoding, α and β are updatedusing a least mean square based gradient descent scheme.Experimental results [10] show that R- λ model is able toproperly model the relationship between distortion and ratewith a coefﬁcient of determination ( r ) value around 0.995.Compared with R-Q model based rate control algorithm [7],the former state of the art, R- λ model based rate controlalgorithm [10] improves the coding efﬁciency by 15.9% for LD and 24.6% for RA respectively. As a result, the R- λ model based rate control algorithm in [10] was adoptedand implemented in the HEVC reference software HM-10.0.However, experimental results in [10] also show that thecoding efﬁciency of the R- λ model based algorithm is stillfar inferior to CQP. B. Bit Allocation

The R- λ model in Equation (7) is highly effective for outputrate estimation and determining the QP value to hit a target bitrate. However, the question of optimally allocating the total bitrate budget to frames and/or CUs for a higher video codingefﬁciency remains open.Li et al [1] proposed a largest coding unit (LCU)-levelseparate model based block level bit allocation scheme, whichachieves a higher rate control accuracy and also a highercoding efﬁciency (2.8% and 3.9% for LD and RA) than[10]. As a result, the algorithm was incorporated into HEVCreference software HM-11.0.As LD conﬁguration and RA conﬁguration are very differentfrom each other in rate distribution, some rate allocationalgorithms were designed to work with only one of the twoconﬁgurations. For example, [15] was predominantly designedto work very well in the RA mode, whereas [12] is among thebest for LD. RA uses a more complex reference hierarchy thatneeds to adapt to input content, so it is generally more difﬁcultfor RC algorithms to work well with RA.Xie et al [13] proposed a temporal dependent bit allocationscheme which allocates bit rate according to the complexity ofeach coding tree unit (CTU). Results show a coding efﬁciencyimprovement of 1.78% over [10] for LD. Wang [14] et alproposed a new relationship between the distortion and λ fora better rate regulation and a higher consistency in qualitywith an average gain of 0.37 dB in PSNR. Li et al [12], [16]proposed a recursive Taylor expansion method to iterativelyestimate a close form of the optimal rate allocation, whichimproved the coding efﬁciency by 2.2% and 2.4% on averageover [1] for LDP and LDB respectively.As to RA conﬁguration, Song et al [21] proposed a groupof pictures (GOP) level rate allocation scheme to accuratelymatch the HEVC GOP coding structure in RA, which achievesa coding efﬁciency that is 0.2% higher than [1]. Gong et al [22]proposed a temporal-layer-motivated lambda domain picturelevel rate control algorithm to better estimate the inﬂuence ofeach layer, which leads to an average gain of 3.87% in codingefﬁciency as compared with [1]. He et al [15] proposed aninter-frame dependency based dynamic programming methodfor frame level bit allocation, which improves the codingefﬁciency by 5.19% on average for RA than [1] with anincrease of 0.41% in encoding time . C. Other methods

Besides rate allocation, various schemes have been proposedfor a better RC performance, such as adaptive quantization,new RC models and multi-pass encoding.Adaptive quantization is another mean of bit allocation,which usually ﬁrst uses traditional RC algorithms to decide a central QP value. The QP values for frames and CUs areadjusted later. Tang et al [23] proposed a Hadamard energyfeature for adaptive quantization and achieved 3.3% gain incoding efﬁciency as compared with [1].Lee et al [17] proposed a Laplacian probability densityfunction to derive a new model between rate and distortion.The coding performance is slightly better than [10] but worsethan [1], so this model was not adopted by HEVC referencesoftware.Besides only using features inside a frame, there are alsosome multi-pass methods, which increases the coding efﬁ-ciency at a cost of higher latency and higher computingcomplexity. Wen et al [11] proposed a pre-compression baseddouble update scheme to better handle scene changes, whichachieves an average gain in PSNR of 0.1dB for commonsingle-scene test sequences and up to 4.5dB for complicatedmulti-scene videos. The macroblock-tree algorithm proposedin [24] was designed to adjust the QP value according tothe frequency at which a block is directly and indirectlyreferenced. An extra pass of encoding is required in themacroblock-tree algorithm, which leads to a higher latency anda higher complexity. Based on the macroblock-tree algorithm,Yang et al [25] proposed a low-delay source distortion tempo-ral propagation model, which improves the coding efﬁciencyof H.264 reference software JM by 15%. Fiengo et al [26]proposed a convex optimization based recursive R-D model,which achieves a gain of 12% in coding efﬁciency as comparedwith [10] with a 10x-50x higher encoding time. Ropert et al[27], [28] proposed a sequential two-pass method for a betterrate allocation, which results in an increase of 16% in codingefﬁciency at a cost of an average increase of 57% in encodingtime as compared with [1].III. T HE P ROPOSED A LGORITHM

In this section, we describe in detail, the proposed gener-alized rate-distortion- λ model, and the proposed rate controlalgorithm based on the new model, including the hierarchicalinitialization, least mean square based update scheme and therate allocation module. A. Rate-Distortion- λ Model

The core of the R- λ model can be found in Equation (6)and (7). Though the model performs well in R-D ﬁtting, thereare two implied assumptions that are invalid under borderconditions in real world applications, namely, (a) inﬁnite bitrate when distortion is zero, (b) inﬁnite distortion when bitrate is zero rate.HEVC provides a lossless encoding mode, where the QPvalue is 4 and the corresponding quantization step is 1.Therefore, the output rate of lossless encoding is the minimumbit rate for the video to be encoded without distortion. Suchlossless bit rate is not much greater than output bit rate valuesfrom lossy encodings. For example, the lossless bit rate ofKimono in class B is 3.3 bpp, and that of FourPeople in classF is 2.2 bpp. A common target bit rate of lossy encodingusually lies in the range between 0.01 bpp and 0.5 bpp, whichis only one to two orders of magnitude smaller than the rate of lossless encoding. Therefore, such border cases cannot beignored in the model. On the other side, zero rate encodingcan be approximated by only storing the average value of thevideo, where the output distortion is the variance of the video.These two boundary cases prove that a more realistic rate-distortion model must intercept both the rate axis and thedistortion axis. This is, however, not true for the R- λ model,which manifests as loss of model accuracy, especially for verylow/high bit rates.According to the discussion above, the proposed model isgiven as follows, D = max (0 , C ( R + B ) − K − T ) (8) ≈ C ( R + B ) − K − T, (9)where C and K are parameters similar to their name sakes inthe traditional R- λ model that represent the basic charactersof the video. According to the results in [10], K is usuallyaround 1. B is the parameter describing the interception onthe distortion axis. The distortion of zero bit rate encoding canbe described by CB − K , which equals to the variance of thevideo. As a result, B is usually much smaller than a typicaltarget bit rate. T is the parameter for the interception on therate axis, which equals to C ( R lossless + B ) − K and usuallyis one to two orders of magnitude smaller than typical outputdistortion. The model can be simpliﬁed to Expression (9) ifonly lossy encoding is considered.The modelling performance between rate and distortion, i.e.how close the model could ﬁt the regression curve to the actualdata, determines how accurately the model could possiblybe updated. Therefore, a ﬁtting experiment was conducted toevaluate the expressive power of the proposed new model indescribing the relationship between rate and distortion, wheretest sequences in HEVC common test condition [29] (classA to E, 20 videos in total) were encoded using the CQPmode with QP values from 4 to 51 tested to cover all possibleoutput rates. When QP is set to 4, the quantization step is 1in encoding, which represents lossless encoding and also thehighest possible output rate. 51 is the greatest QP value that isallowed in HEVC, which leads to a lowest possible output rate.For each video sequence, the relationship between distortion(measured by mean squared error, MSE) and rate (measured by TABLE I: RD Curves Fitting Performance

QP Range Model r RMSEworst avg std worst avg std4...51 [10] 0.9504 0.9925 0.0106 11.24 3.03 2.91Proposed 0.9777 0.9966 0.0055 7.62 1.32 1.624...22 [10] 0.9046 0.9831 0.0242 1.59 0.32 0.36Proposed 0.9763 0.9940 0.0074 0.36 0.15 0.1117...37 [10] 0.8904 0.9880 0.0252 9.89 1.47 2.10Proposed 0.9603 0.9941 0.0108 2.65 0.65 0.7232...51 [10] 0.9818 0.9978 0.0040 8.24 2.14 2.37Proposed 0.9880 0.9992 0.0027 6.90 0.76 1.47 bpp) were ﬁt using the rate-distortion model [10] in Equation(6) and the proposed model deﬁned in Equation (9).Table I gives the ﬁtting results of the model used in R- λ model and the proposed model with coefﬁcient of deter-mination ( r ) and root mean square error (RMSE) used asthe metrics. r describes how close the actual data is to theﬁtted regression curve. RMSE is the standard deviation of theprediction residues. A higher r value or a lower RMSE valuefor a same set of data suggests a better ﬁtting.In the experiment, four QP ranges (namely, full QP range:4...51, low QP range: 4...22, middle QP range: 17...37, high QPrange: 32...51) were tested to compare the expressive powersof two models for different QP ranges. Each of the sub-rangescontains around 20 points to guarantee a similar difﬁculty inﬁtting. As shown in Table I, for the full QP range (4...51), theproposed model is able to improve r value from 0.9925 to0.9966 on average, with RMSE reduced from 3.03 to 1.32, i.e.a 56% reduction. It should be noted that the r values in theproposed experiment are lower than the results in [10] because48 data points (QP from 4 to 51) were used in our experiment,while only four points (QP in {

22, 27, 32, 37 } ) were usedin [10]. Furthermore, the proposed model is intended for boththe average and worst cases for full QP range, while the muchlower standard deviations (std) of r and RMSE show that theproposed model is more robust than the model in [10] whenhandling different kinds of videos.In addition, the results in Table I prove that the proposedmodel outperforms the R- λ model for all three sub-ranges ofQP values. The R- λ model sometimes produces very low r values for low QP values. As discussed above, a common M SE actual dataold modelnew model (a) PeopleOnStreet 1600p M SE actual dataold modelnew model (b) PartyScene 480p M SE actual dataold modelnew model (c) BlowingBubbles 240p Fig. 1: R-D Curves Fit Using the Two Different Models target bit rate of lossy encoding is only one to two ordersof magnitude smaller than the rate of lossless encoding. As aresult, the proposed model greatly beneﬁts from the reﬁnementon the zero-distortion boundary case and therefore producesa much higher and stabler ﬁtting performance. On the otherside, the distortion of zero rate encoding, i.e. the variance ofthe video, is usually much greater than the distortion aftercompression, so the gain in r for the high QP range due tothe zero-rate boundary case is expected to be smaller than thatfor the low and middle QP ranges. The R- λ model is alreadyvery accurate for the high QP range, while the proposed modelstill provides a similar improvement in RMSE for the highQP range as compared with other ranges. It should be notedthat r and RMSE both increase for the high QP range ascompared with the low and middle QP ranges because RMSEis proportional to distortion. So RMSE increases for the highQP range due to a much greater distortion in spite of a higher r value.Fig. 1 gives three examples of the RD curves ﬁtted usingthe two models. The original data points are plotted in distinctred circles, while the curves ﬁtted using the old model and theproposed model are plotted using blue dashed line and blacksolid line respectively. It needs to be noted that the modelcoefﬁcients were estimated using all data points in the fullQP range, while only a part of data points (QP 15 to 44) isplotted in Fig. 1 to prevent the large range making the plothard to discern. As can be seen from the ﬁgure, the R- λ modelperforms well for high QP values and starts to deterioratewith a lower QP value, while the proposed model is able toaccurately predict the RD relationship for all cases.Based on the proposed rate-distortion model deﬁned inEquation (9), the proposed R-D- λ model can be derived asfollows, λ = − ∂D∂R = CK ( R + B ) − K − (10) = α ( bpp + γ ) β , (11) bpp = ( λα ) /β − γ, (12) α = CK, (13) β = − K − , (14) γ = BW × H × F r , (15)where λ is the opposite number of the derivative betweendistortion and rate. Equation (11) and (12) provide another twointerpretations of Equation (10) with the unit of rate convertedfrom bits to bpp. α , β and γ are the short codes of thevideo characters related parameters with deﬁnitions given inEquation (13)-(15). These three parameters are updated usingthe algorithm proposed in Sec. III-C as coding proceeds. W , H and F r are the width, height and frame rate of the video.Based on the model deﬁnition introduced above, λ can becalculated according to the target bit rate. And the value ofQP can be calculated using the logarithmic function deﬁnedin Equation (2). The values of c and c in Equation (2)were determined through a tuning experiment where variousvalues were tested to cooperate with the remaining parts ofthe proposed algorithm. It was found that the following QP- λ TABLE II: RA Coding Structure in HEVC

FrameNum POC Level Ref Num QP Offset λ Multiplier1 8 1 3 1 0.4422 4 2 3 2 0.35363 2 3 4 3 0.35364 1 4 4 4 0.685 3 4 4 4 0.686 6 3 3 3 0.35367 5 4 4 4 0.688 7 4 4 4 0.68

TABLE III: LD Coding Structure in HEVC

FrameNum POC Level Ref Num QP Offset λ Multiplier1 1 3 4 3 0.46242 2 2 4 2 0.46243 3 3 4 3 0.46244 4 1 4 1 0.578 relationship provided the best coding efﬁciency for the HEVCcommon test condition, and therefore is used in the proposedalgorithm. QP = round (4 . × ln ( λ ) + 14 . . (16) B. Hierarchical Initialization

Similar to [10], in the proposed algorithm, frames of a samereference hierarchy share a same set of model coefﬁcients,which is updated after every encoding of frame. All CUsinside a frame share the model of that frame without separatemaintenance. The update mechanism of the model coefﬁcientsis introduced in Sec. III-C.In [10], the initial values of α and β are set to 3.2003and -1.367 respectively, which are the averaged ﬁtted valuesusing the sequences in the HEVC common test condition. Itis agreed that the initial values of model coefﬁcients do nothave an overall signiﬁcant inﬂuence on the coding efﬁciencyas long as model coefﬁcients can be properly updated duringencoding. However, it is still beneﬁcial to set the initial valuesaccording to the reference frame hierarchy.A hierarchically structured GOP mechanism with framesat different levels of the hierarchy using different numbersof reference frames and different QP values was proposed inHEVC. In general, a smaller QP value is used for frames ofa lower level, which are referenced more and reference other M SE framelevel 0framelevel 1framelevel 2framelevel 3framelevel 4 Fig. 2: R-D Curves of Different Frame Levels frames less. The hierarchical structures of HEVC (RA in TableII, LD in Table III) were carefully designed and ﬁne-tunedfor a statistically optimal coding efﬁciency under the HEVCcommon test condition. Intra-picture coded frames (I frames)belong to frame level 0, while inter-picture coded frames (Pframes and B frames) are categorized into frame level 1 to 4 inRA. A lower frame level represents a higher importance in thereference structure and a lower dependence on other frames incoding and therefore a usually lower coding efﬁciency. Fig. 2gives the RD curves of different frame levels of “PartyScene”from class C, which was encoded by HM-16.9 using RAconﬁguration with QP values from 22 to 37. As shown inFig. 2, for a given output quality, frames of frame level 4 onlyrequire about 25% of the bit rate that is needed by framesof frame level 1. Therefore, it is not proper to set the initialvalues of all frame levels to the same value.To derive a close form of the optimal relationship betweenmodel parameters and frame level, it is assumed that whena video sequence is encoded to a bit rate of R by encoderA and a certain quality, encoder B would encode that videointo a similar quality if the output bit rate is dR , where d is the relative coding efﬁciency between encoder B andencoder A over a range of bit rates. This assumption is widelyused in many encoder evaluation schemes, such as the BDBRmetric [5]. In this paper, it is assumed that the linear scalingof difference in coding efﬁciencies between two encoders isapplicable to the coding efﬁciencies of two frame levels aswell. The distortions when encoding a frame at two differentframe levels ( i and j ) by the same encoder can be modelledby D = C i ( R + B i ) − K i − T i (17) = C j ( dR + B j ) − K j − T j , (18)where d is the relative coding efﬁciency between level j and i . T i is the feature same in Expression (9), which equals to C i ( R i,lossless + B i ) − K i and is one to two orders smaller thanthat in lossy encoding cases. The difference between T i and T j is much smaller than common distortion and therefore canbe ignored. As introduced earlier, B is usually much smallerthan practical bit rates so that Expression (19) and (21) canbe approximated using Taylor expansion. According to theresults in [10], K is usually around 1 so that K will not affectthe order of magnitude of BR . So B i B j R in Expression (22) ismuch smaller than 1, and therefore can be ignored to get theﬁnal approximation. The entire approximation is described asfollows, C i ( R + B i ) − K i ≈ C j ( dR + B j ) − K j (19) C i R − K i (1 − K i B i R ) ≈ C j ( dR ) − K j (1 − K j B j dR ) (20) R K j − K i ≈ C j C i d − K j − K j B j dR − K i B i R (21) ≈ C j C i d − K j (1 − K i K j B i B j dR ) (22) ≈ C j C i d − K j (23) TABLE IV: C for Different Frame Levels in RA Frame Level 1 (P frame) 2 3 4Ref Distance 8 4 2 1 C To ensure Expression (23) roughly holds for a wide rangeof rates to satisfy the assumption, K i and K j must be veryclose to each other, while the C values for different framelevels roughly follows a reciprocal relationship of the relativecoding efﬁciency of each frame level. C j C i ≈ d K ≈ d (24)Wang et al [30] concludes that coding efﬁciency decreasesas a logarithmic function of the distance between the currentframe and the reference, which is highly consistent with theﬁtted results of the RD curves of different frame levels given inTable IV. It should be noted that the logarithmic relationship isnot a good ﬁt for frame level 1 (P frames), as frames in level 1only allows uni-directional prediction, while frames of higherframe levels can be encoded using bi-directional predictionthat improves the coding efﬁciency.Moreover, the hierarchical initialization scheme for γ canbe obtained using the zero rate encoding case, where thetheoretical maximum distortion is the variance, which issimilar for frames within one scene. Therefore, the hierarchicalinitialization scheme for γ can be derived as follows, C i B i − K ≈ C j B j − K (25) ( B j B i ) K ≈ C j C i ≈ d K (26) γ j γ i ≈ C j C i ≈ d (27)In summary, the model coefﬁcients of different frame levelsare initialized hierarchically, while frames of the same framelevel share the same set of model coefﬁcients. For the RAconﬁguration, all frame levels share the same β value of -1.35, while α and γ follow a ﬁxed proportional relation of[4.2:3:2:1] from the results in Table IV for frame level 1 to 4with center values (frame level 2) set to 4.4 and 0.005 for α and γ respectively. For the LD conﬁguration, β is alsoset to -1.35, while α and γ are set to 2.4 and 0.005 forframe level 1 to 3. 0.005 is the averaged ﬁtted values of γ using the sequences in HEVC common test condition. A γ value of 0.005 (i.e. 140kbps for 720p@30fps and 60kbps for480p@30fps) is usually much lower than practical target bitrates. It is still possible that γ is comparable with the targetbit rate for very simple videos. In those cases, the relationshipthat B is much smaller than target bit rate will not sustain andthe approximations like Expression (21) will fail. To ensurethis relationship, the initial γ is clipped by 0.1 times of thetarget bit rate as the upper bound. C. Model Parameters Update Scheme

In the proposed rate control algorithm, frames of a sameframe level share a same set of model coefﬁcients, which is updated after every encoding of a frame. Unlike the LCU-level separate model proposed in [1], the proposed algorithmonly allows separate models for different frame levels. CUsinside a frame share the model of this frame, which willnot be updated until this frame is completely encoded. In theproposed algorithm, each frame is ﬁrst allocated with a targetbit rate, denoted as bpp . The estimated coding parameter λ (denoted as λ ) can be calculated using λ = α i ( bpp + γ i ) β i , (28)where i is the frame level of the current frame. The corre-sponding QP value ( QP ) can be calculated using Equation(16).After the current frame is encoded, the output bit rate ( bpp )can be observed, which is used to update the proposed R-D- λ model using the least mean square (LMS) update rule. The λ estimation for hitting a target bit rate can be considered aregression problem, where bpp is the input variable, and λ isthe estimated output. Then ( bpp , λ ) becomes an actual datapoint of the model, while λ is the estimated biased output,which can be calculated as follows. λ = α i ( bpp + γ i ) β i . (29)To make the power function easy to be updated in gradientdescent, a squared logarithmic error ( e ) is used as follows, e = 12 ( lnλ − lnλ ) , (30) lnλ = lnα i + β i ln ( bpp + γ i ) (31)The derivatives between e and α , β as well as γ can becalculated as follows, ∂e ∂α i = ∂e ∂lnλ ∂lnλ ∂lnα i ∂lnα i ∂α i = − ( lnλ − lnλ ) ∂lnλ ∂lnα i ∂lnα i ∂α i = − ( lnλ − lnλ ) 1 α i (32) ∂e ∂β i = − ( lnλ − lnλ ) ∂lnλ ∂β i = − ( lnλ − lnλ ) ln ( bpp + γ i ) (33) ∂e ∂γ i = − ( lnλ − lnλ ) ∂lnλ ∂γ i = − ( lnλ − lnλ ) β i bpp + γ i , (34)Based on the LMS update rule, the model coefﬁcientscan be updated using the updating strength set ( σ α , σ β , σ γ )as follows. The symbols with a dot symbol above are thecoefﬁcients after update. ˙ α i := α i + σ α ( lnλ − lnλ ) 1 α i , (35) ˙ β i := β i + σ β ( lnλ − lnλ ) ln ( bpp + γ i ) , (36) ˙ γ i := γ i + σ γ ( lnλ − lnλ ) β i bpp + γ i . (37)In the proposed algorithm, the initial update strengths of α , β and γ are set to 0.05, 0.2 and 0.000001 times of the target bpp, which gradually decrease during the encoding processusing a decay scheme. Inspired by the decreasing learning rateschemes that are widely used in deep learning applications, anexponential decay scheme is used in the proposed algorithmfor a better convergence and noise reduction. In the proposedalgorithm, each frame level maintains a set of model coefﬁ-cients as well as a decay value, which is multiplied by 0.99after a frame of that level is encoded. Furthermore, the efﬁcientscene change detection algorithm proposed in our previouswork [23] is included in the proposed algorithm. If a scenechange is observed, the model parameters are reset to initialvalues and the decay value is reset to 1. It should be notedthat the videos in the HEVC common test condition do notinclude scene changes, so the scene change detection modulewas disabled in the experiment. D. Rate Allocation

The rate allocation scheme in the proposed algorithm can beseparated into two levels, GOP level and picture level. CUsinside a frame share a same set of coding parameters for aspatially consistent output quality. The rates mentioned in thissection are all measured in bpp.As a special case, I frames are usually very different frominter-picture coded frames in terms of R-D relationship, sothere is usually a special rate control module for I frames.Similar as [10], [1], the proposed rate allocation module ﬁrsttreats I frames as inter-picture coded frames and obtain a targetbit rate for that frame. Then the algorithm in [31] is usedto reﬁne the target bit rate and get a new target bit rate aswell as a new set of coding parameters. As I frames usuallyconsume a bit rate that is much higher than the average targetbit rate, Li et al [1] proposed a smooth window scheme tomitigate the overﬂow in bit rate consumption, where the rateoverﬂow is compensated in the next smooth window (typically40 frames) at any instant moment. The smooth window schemeworks well for most cases, but it may fail in the cases withexcessive bit consumption. In addition, the compensation willbe unbalanced if the planned intra period and the length ofsmooth window are not aligned.To solve this problem, an amortization and smooth windowjoint scheme with restriction on maximum bit rate for Iframes is designed in the proposed rate control algorithm.“Amortization” is a process where the “debt”, i.e. the excessiveconsumption of bit rate caused by I frame encoding, is paid off(i.e. compensated) by the remaining non-I frames within thecurrent intra period. For example, an I frame (frame number i ) is ﬁrst considered as a P frame and allocated with a targetbit rate ( R i ). R i is reﬁned using the algorithm in [31] intoa new target bit rate R i , which is restricted to be not greaterthan half of the target rate consumption of the intra period.After encoding, the I frame is encoded into a different numberof bits R i . The rate to be recorded (denoted as R i , whichis R i for I frames and R i for non-I frames) is used in thesmooth window module, while the overhead of I frames (i.e. R i − R i ) is amortized by the remaining non-I frames withinthe current intra period as follows, R am = R i − R i IntraP eriod − , (38) where R am is the averaged amortized rate compensation foreach frame, and IntraP eriod is the length of the currentintra period. It needs to be noted that I-frames overhead isamortized in a weighted manner rather than in a uniform way.Details are introduced in the frame level rate allocation part. R am is an averaged compensation to make the calculationeasier. On top of the amortization scheme, the smooth windowmodule proposed in [1] only accumulates and compensates theoverﬂow caused by non-I frames.Given the current status of amortization, the proposed GOPlevel bit allocation ﬁrst deducts the target amortization fromthe average target bit rate. Then the overﬂow caused by non-I frames is compensated uniformly for each GOP within asmooth window, which is set to 40 frames in the proposedalgorithm. According to the description, the target bit rate ofthe current GOP ( R GOP ) can be calculated as follows. R of = N − X i =0 ( R i − R i ) (39) R GOP = ( R avg − R am − R of SW ) N GOP . (40) R of is the accumulated non-I rate overﬂow of the N framesthat have been encoded. SW is the length of the smoothwindow. R avg is the average target bit rate, and N GOP isthe length of the current GOP.Within the GOP, the target bit rate for each frame is allo-cated in a weighted manner, aiming at a sensible hierarchicalquality distribution. As the output bit rate can be estimated byEquation (12), the optimal frame level rate allocation can beconsidered as an optimal central λ selection problem, whichis introduced as follows, min λ ( abs ( N GOP − X i =0 max ( R i , minRate ) − R GOP )) , (41) R i = ( λω i α i ) /β i − γ i , (42)where α i , β i , γ i are the model coefﬁcients for frame i . λ isthe central λ value to be solved, and ω i is the λ multiplierfor each frame. abs ( . ) is the absolute value function. Same as[10], minRate of one frame is set to 100 bits as the minimumachievable number of bits in the proposed algorithm. Similarto the hierarchical scheme introduced in Sec. III-B, frames ofa same frame level share the same value of ω .Based on Equation (10), the relationship between D and λ can also be interpreted in the forms of Equation (43) and (44) ( DC ) K +1 K = R − K − = λCK , (43) D K +1 K = λK C K , (44) D ≈ λC. (45)Expression (45) is approximated from Equation (44) based onthe fact that K is usually close to 1, which was mentionedin [10]. Table IV shows that the relationship among the C coefﬁcients for different frame levels of RA follows aproportional relationship of [4.2:3:2:1]. Therefore, a reciprocalrelationship of λ would roughly produce a similar output quality for frames of different frame levels. As the hierarchicalstructure was designed to encode a more important frameinto a better quality, the proposed rate allocation algorithmuses a relationship of [1:2.5:4.5:10] for the ω values fordifferent frame levels in RA. After the ω values are speciﬁed,the optimization in Expression (41) can be solved using aniterative binary search. The corresponding ω values for LD areset to [1:4:5]. The ﬁxed values above were selected through anexperiment, which tried various values to cooperate with theremaining parts of the proposed algorithm for a higher codingefﬁciency.After the frame level coding parameters are determined,CUs inside a frame share the same set of coding parameters,which may slightly reduce the accuracy of rate control butimprove coding efﬁciency and spatial consistency in outputquality. E. Consistency Control

After λ value is speciﬁed, QP can be calculated usingEquation (16). To guarantee the output quality to be consistentover time, λ and QP must not change signiﬁcantly. Therefore,a constraint on the maximum QP difference between differentframes is used. The maximum QP difference between twoconsecutive frames of a same frame level is set as 3, whilethe maximum QP difference between two consecutive encodedframes is 10. IV. E XPERIMENTAL R ESULTS

A. Experiments Set-Up

To evaluate the performance of the proposed algorithm, theproposed algorithm was implemented in the HEVC referencesoftware HM-16.9 . Besides the proposed algorithm, the state-of-the-art rate control algorithms [7], [10], [1], [12], [15] werealso tested in the experiment. It should be noted that thealgorithms in [10] and [1] provide two modes each, allowingLCU-level separate model or not. The corresponding results oftwo modes are denoted with sufﬁx Frame/LCU in the tables.The default setting of HM enables LCU-level separate model.It is found that the rate control algorithm in HM is able toachieve a higher coding efﬁciency and a higher bit rate errorwithout LCU-level separate model, so the two modes wereboth tested in the experiments.The HEVC common test condition [29] is used as the coretest of the proposed experiment. All 20 videos (roughly 10seconds each) in class A to E of HEVC common test conditionare used as the test sequences to cover videos of variousvideo characteristics. According to the HEVC common testcondition, the sequences in class B, C, D and E are testedfor the LD test, including low delay P (LDP) conﬁgurationand low delay B (LDB) conﬁguration, while the sequencesin classes A, B, C and D are tested in the RA test. Asinstructed in the common test condition, the test sequences HM-16.x adopts the same RC algorithm [1] as that of HM-14 as well asthe same coding syntaxes. It is widely acknowledged that there is no gain incoding efﬁciency introduced since HM-14. More details of the HEVC common test condition can be found inhttp://phenix.int-evry.fr/jct

TABLE V: RD Performance for RA Conﬁguration, CQP as Anchor

Clip [7] [10]-Frame [10]-LCU [1]-Frame [1]-LCU [15] ProposedBDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ RA NebutaFestival 9.55 0.62 14.20 0.88 14.78 0.57 4.06 0.43 2.97 0.20 -2.94 1.53 3.74 0.79A PeopleOnStreet 44.35 0.74 41.95 0.96 50.13 0.26 18.13 0.05 24.91 0.01 22.30 1.22 1.57 0.27*A SteamLoco 202.06 0.77 62.31 2.96 54.93 3.05 44.76 3.20 46.29 3.69 43.65 2.46 45.57 3.91A Trafﬁc 65.00 1.49 6.03 0.60 6.06 0.52 1.20 0.77 4.57 0.87 0.55 0.56 0.77 1.51B BasketballDrive 33.22 0.87 4.65 0.65 7.78 0.61 2.00 0.67 6.60 0.57 -0.26 0.43 -1.84 0.43B BQTerrace 57.54 0.40 3.74 0.88 5.76 0.67 3.88 1.44 7.44 0.93 5.01 2.11 -0.37 1.79B Cactus 76.00 0.53 0.86 0.09 1.60 0.04 1.71 0.06 3.18 0.03 -3.13 1.60 -2.48 0.03B Kimono 44.72 1.30 8.86 0.13 10.18 0.07 8.08 0.23 9.86 0.45 3.81 0.83 8.91 0.41B ParkScene 54.72 0.96 2.95 0.13 5.28 0.38 2.74 0.31 3.14 0.03 -3.18 1.09 0.01 0.68C BasketballDrill 58.56 0.50 3.81 1.14 3.60 1.12 1.41 1.22 0.58 1.06 -2.52 0.76 -3.37 0.71C BQMall 87.82 0.43 15.87 1.07 18.38 0.65 13.52 1.04 12.55 0.60 8.55 0.67 4.74 1.23C PartyScene 102.03 0.23 10.89 0.54 13.59 0.54 4.77 0.50 3.94 0.29 -0.64 0.42 0.04 0.51C Racehorses 48.13 1.03 14.23 0.05 16.46 0.09 7.12 0.06 9.61 0.05 4.15 1.22 2.67 0.01D BasketballPass 27.64 0.42 6.75 0.72 9.91 1.01 3.82 0.73 4.75 1.05 2.74 0.70 0.16 0.56D BlowingBubbles 70.31 0.73 17.77 1.73 25.97 0.75 10.95 1.11 15.25 0.61 9.27 1.16 2.45 1.54D BQSquare 91.74 0.42 7.22 1.05 8.99 1.01 5.01 1.28 5.88 1.03 -1.02 2.54 0.30 0.96D Racehorses 44.49 1.13 10.13 0.27 12.56 0.20 3.40 0.34 4.44 0.24 -2.76 2.00 -0.56 0.20Average 65.76 0.74 13.66 0.81 15.65 0.68 8.03 0.79 9.76 0.69 4.92 1.25 3.67 0.92Average Without * 57.24 0.74 10.62 0.68 13.19 0.53 5.74 0.64 7.48 0.50 2.50 1.18 1.05 0.73

TABLE VI: RD Performance for LDP Conﬁguration, CQP as Anchor

Clip [7] [10]-Frame [10]-LCU [1]-Frame [1]-LCU [12] ProposedBDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ RB BasketballDrive 17.70 0.54 0.54 0.64 7.37 0.61 0.53 0.64 4.51 0.54 4.44 0.62 3.60 0.64B BQTerrace 53.03 0.79 0.38 1.27 1.09 1.15 0.32 1.25 5.84 1.05 0.70 1.10 0.58 1.21B Cactus 42.82 0.11 -5.61 0.10 -4.42 0.03 -5.56 0.04 -2.13 0.02 -5.23 0.02 -8.29 0.04B Kimono 33.48 0.30 8.65 0.51 14.10 0.03 6.96 1.72 9.21 0.03 6.14 0.03 6.51 0.81B ParkScene 39.82 0.43 1.25 0.23 4.82 0.04 0.63 0.78 2.37 0.04 0.66 0.02 1.00 0.57C BasketballDrill 11.51 1.32 -5.43 1.32 -3.95 1.21 -5.39 1.35 -5.78 1.24 -6.13 1.29 -7.53 1.35C BQMall 22.38 1.18 7.08 1.09 9.08 1.02 7.08 1.09 7.48 1.04 6.51 0.99 5.74 1.06C PartyScene 69.89 1.51 2.16 0.69 7.43 0.70 2.17 0.69 2.77 0.59 1.32 0.62 2.45 0.66C Racehorses 20.67 0.07 7.63 0.29 14.84 0.57 7.79 0.11 9.94 0.07 9.38 0.07 8.31 0.06C BasketballPass 9.85 1.72 1.33 1.05 7.94 1.05 1.39 1.04 2.20 1.00 2.21 1.30 2.38 1.09D BlowingBubbles 21.22 1.13 1.09 1.05 7.62 1.02 1.08 1.04 2.28 0.94 0.57 0.87 2.53 1.00D BQSquare 41.05 1.87 1.26 1.53 7.42 1.42 1.26 1.57 2.13 1.42 1.43 1.40 -0.35 1.50D Racehorses 10.25 0.13 2.69 0.29 7.73 0.26 2.75 0.21 3.48 0.18 3.12 0.18 3.14 0.18E FourPeople 19.18 0.58 6.28 0.08 11.87 0.06 6.18 0.18 2.25 0.08 -2.85 0.05 -8.10 0.16E Johnny 70.91 0.47 12.79 0.15 27.62 0.06 12.80 0.12 6.67 0.06 1.83 0.05 -5.39 0.09E Kristen&Sara 37.86 0.20 5.81 0.08 15.48 0.07 5.80 0.09 0.31 0.10 -6.28 0.11 -11.33 0.09Average 32.60 0.77 2.99 0.65 8.50 0.58 2.86 0.74 3.35 0.52 1.11 0.54 -0.30 0.66

TABLE VII: RD Performance for LDB Conﬁguration, CQP as Anchor

Clip [7] [10]-Frame [10]-LCU [1]-Frame [1]-LCU [12] ProposedBDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ R BDBR ∆ RB BasketballDrive 19.04 0.61 5.62 0.69 8.64 0.67 0.94 0.69 4.93 0.58 5.01 0.66 3.58 0.68B BQTerrace 76.62 0.77 1.71 1.48 4.69 1.35 3.38 1.48 8.64 1.25 2.78 1.29 1.38 1.43B Cactus 46.83 0.13 -4.81 0.10 -3.52 0.03 -4.16 0.05 -1.03 0.03 -4.22 0.02 -8.26 0.04B Kimono 35.64 0.28 11.81 0.55 13.58 0.03 6.48 1.71 8.52 0.02 5.57 0.02 5.74 0.84B ParkScene 41.17 0.42 2.38 0.32 4.93 0.06 0.95 0.77 2.76 0.05 0.94 0.04 1.10 0.64C BasketballDrill 38.42 1.67 -6.33 1.33 -3.32 1.25 -4.77 1.40 -5.39 1.25 -5.56 1.29 -7.00 1.37C BQMall 22.42 1.15 9.32 1.10 9.54 1.06 7.86 1.12 7.81 1.07 6.56 1.02 6.18 1.08C PartyScene 58.28 1.21 4.89 0.70 8.13 0.70 2.40 0.69 3.05 0.60 1.56 0.63 1.71 0.68C Racehorses 20.98 0.04 13.31 0.26 15.53 0.51 8.04 0.06 10.63 0.03 10.32 0.02 8.39 0.05C BasketballPass 8.72 1.63 5.65 1.09 8.22 1.11 1.63 1.12 2.17 1.03 2.49 1.35 2.55 1.12D BlowingBubbles 19.75 1.18 4.94 1.05 7.66 1.03 0.91 1.02 2.00 0.93 0.02 0.87 2.19 1.03D BQSquare 41.26 1.67 8.06 1.56 10.93 1.48 2.85 1.62 3.66 1.42 3.02 1.45 0.61 1.47D Racehorses 10.30 0.13 7.03 0.28 8.63 0.18 3.15 0.14 3.62 0.12 3.34 0.12 3.30 0.14E FourPeople 19.21 0.71 14.76 0.10 12.53 0.07 6.77 0.20 3.07 0.08 -2.21 0.07 -7.44 0.19E Johnny 69.72 0.71 34.84 0.37 30.70 0.26 16.58 0.34 9.64 0.26 2.95 0.26 -2.68 0.31E Kristen&Sara 39.79 0.22 18.94 0.08 15.94 0.10 7.26 0.08 1.55 0.10 -5.16 0.10 -10.30 0.09Average 35.51 0.78 8.26 0.69 9.55 0.62 3.77 0.78 4.10 0.55 1.71 0.57 0.07 0.70

70 80 90 100 110 120 130 140 150Frame Number00.511.52 B i t s cqpHM-16, r =0.942Proposed, r =0.975 (a) PeopleOnStreet Bits

70 80 90 100 110 120 130 140 150Frame Number30354045 PS NR cqpHM-16Proposed (b) PeopleOnStreet PSNR

70 80 90 100 110 120 130 140 150Frame Number051015 B i t s cqpHM-16, r =0.950Proposed, r =0.992 (c) BlowingBubbles Bits

70 80 90 100 110 120 130 140 150Frame Number32343638 PS NR cqpHM-16Proposed (d) BlowingBubbles PSNR Fig. 3: Per-Frame Bits and PSNR for PeopleOnStreet and BlowingBubbles PS NR HM-16Proposed (a) RA, PartyScene, 480p PS NR HM-16Proposed (b) RA, BlowingBubbles, 240p PS NR HM-16Proposed (c) LDP, BQTerrace, 1080p PS NR HM-16Proposed (d) LDP, FourPeople, 720p PS NR HM-16Proposed (e) LDB, Cactus, 1080p PS NR HM-16Proposed (f) LDB, Johnny, 720p

Fig. 4: Comparisons of the RD Curves for Different Sequences were ﬁrst encoded by HM-16.9 using the CQP mode (QP in { } ) and three conﬁgurations to generate the targetaverage bit rates of the later rate control (ABR) encoding.The results of the CQP mode are also used as the anchor ofthe comparison, which is widely accepted as the statisticalupper limit of the coding efﬁciency that one-pass rate controlalgorithms can approach.Then different rate control algorithms were used to encodethe test sequences into the corresponding target bit rates gener-ated by the CQP mode. As to the state-of-the-art algorithms, itshould be noted that the algorithm in [12], the best rate controlalgorithm for LDP and LDB conﬁgurations, was designeddedicated for LD, while the algorithm in [15], so far the bestRC algorithm for RA, was solely proposed for RA. As a result,the algorithms in [12] and [15] were only tested for LD andRA respectively.As instructed in the HEVC common test condition, thealgorithms were evaluated using two metrics, PSNR basedcoding efﬁciency and rate control accuracy in sequence level.Four points of encoding results (bit rate and MSE based YUVPSNR) were used to calculate the BDBR between the CQPmode and RC algorithms. BDBR was proposed in [5], whichestimates the delta bit rate that is needed for the currentcodec to achieve a same quality as compared with the anchorcodec. A negative value of BDBR suggests an average bitrate saving for a given quality after compression, i.e. gainin coding efﬁciency. Rate control accuracy was measured bythe averaged absolute bit rate error per sequence using thefollowing formula: ∆ R = | R out − R target | R target × , (46)where | . | operator is to get the absolute value. The experimentswere conducted on a server with dual Intel Xeon CPU E5-2695v2 without optimization on parallelism and SIMD.To better evaluate the performance of the proposed algo-rithm, subjective quality (SSIM[32]) based coding efﬁciency(BDBR-SSIM) is also analyzed in the proposed experimentbesides the HEVC common test condition which only com-pares the objective quality (PSNR) based coding efﬁciency.In the subjective experiment, BDBR-SSIM results are usedto evaluate the performance of the proposed algorithm. SSIMvalue is designed to lie in the range of [0, 1], a greater SSIMvalue suggesting a better subjective quality. However, SSIMis not a distance metric, which is usually ﬁrst translated into adB value ( SSIM dB ) using Equation (47) for a better ﬁtting inthe BDBR tool, especially when SSIM values are very closeto 1. SSIM dB = min ( − ∗ log (1 − ssim ) , , (47)In addition, the respective impact caused by the modules inthe proposed algorithm are also tested and discussed, includingobjective and subjective quality based coding efﬁciency as wellas computing complexity. B. Performance for HEVC Common Test Condition

Table V gives the results of various algorithms for RA. Thedefault algorithm in HM ([1]-LCU) is 9.76% worse than the CQP mode, while the state-of-the-art RC algorithm for RA[15] reduces the gap to 4.92% at a cost of nearly doubled ratecontrol error. The proposed algorithm is only 3.67% awayfrom CQP, while the rate control error is 0.92%, which ismuch lower than [15]. It should be noted that among thetest sequences, SteamLoco (marked with ∗ in Table V) is anoutlier due to its extremely high complexity. SteamLoco is avideo captured by a moving camera, which contains a runningsteam locomotive with plenty of billowing steam, i.e. rich andvolatile texture with limited temporal similarity. Therefore,SteamLoco is not suitable for hierarchical schemes which werebuilt based on the assumption of high temporal similarity.None of the algorithms is able to properly handle this video,and the BDBR losses of various RC algorithms over CQP areall around 45%. The BDBR results of SteamLoco are muchgreater than other videos, which has a signiﬁcant inﬂuenceon the average results. Therefore, the bottom row of TableV also provides the averaged results without SteamLoco tobetter compare the performance on other videos. Under thatcriterion, the proposed algorithm is only 1.05% worse thanthe CQP mode, while the algorithms in [1]-LCU and [15] are5.74% and 2.50% worse than the CQP mode.Table VI and VII give the results of the LDP and LDBconﬁgurations. Experimental results show that the proposedalgorithm is on average 0.30% better than CQP for LDP andonly 0.07% worse than CQP for LDB with a very low bit rateerror. The proposed algorithm is the ﬁrst rate control algorithmthat is able to reach the statistically upper limit of the codingefﬁciency of single-pass rate control algorithms for LD. Onthe other side, [1]-LCU, the default RC algorithm in HM, is3.35% and 4.10% worse than CQP for LDP and LDB, whilethe best RC algorithm for LD in [12] is 1.11% and 1.71%worse than CQP for LDP and LDB respectively. It shouldbe noted that the results of [12] is slightly different from theresults in their paper, because a different setting of option“KeepHierarchicalBit” was used in [12]. In the experiment of[12], option “KeepHierarchicalBit” was set to 1 for the anchorconﬁguration, while the default setting in HM-16.9 is 2, whichoutperforms 1 in coding efﬁciency.It should be noted that the proposed algorithm achieves abetter coding efﬁciency than the CQP mode for some videos.This result is not against the claim that the coding efﬁciencyof the CQP mode is the “statistical” upper limit of single-pass RC algorithms. The coefﬁcients in the QP- λ relationshipin Equation (2) and the hierarchical structures are statisticallyoptimal for the HEVC common test condition, which are notnecessarily optimal for any single video. As a result, it ispossible that a RC algorithm produces a rate allocation andcoding parameters selection scheme that is closer to the per-sequence optimum than the CQP mode.Fig. 3 gives some examples of per-frame rate consumptionand PSNR using the proposed algorithm, the default RCalgorithm in HM [1] and the CQP mode, where r is thecorrelation coefﬁcient between the output per-frame rates ofRC algorithms and CQP. The CQP mode is usually consideredas the statistical upper bound of the coding efﬁciency of single-pass rate control algorithms, because the CQP mode wasdesigned to encode a video into a reasonable distribution of quality after compression. The output per-frame rates of CQPmode suggest a favorable distribution of per-frame rates to hitthe favorable distribution of quality after compression. As aresult, a higher correlation indicates a better rate allocation.Fig. 3-(a) and 3-(c) show that the proposed algorithm is ableto produce per-frame rates that are much closer to the CQPoutput than [1]. Fig. 3-(b) gives an example of improper over-ﬂow compensation of [1], which causes a signiﬁcant qualitydeterioration after a frame that consumes too many bits. On thecontrary, the proposed algorithm encodes that part into a moretemporally consistent quality. Fig. 4 gives some examplesof RD curves comparison for different conﬁgurations, whichshow that the proposed algorithm is able to steadily improvethe coding efﬁciency for a wide range of bit rates. Fig. 4-(d)and 4-(f) show that the proposed algorithm is able to providea higher gain for the cases targeting very low bit rates. C. Module-level Subjective and Objective Performance

The proposed module-level experiment includes objectiveand subjective performance evaluation as well as complexityanalysis. As some modules in the proposed algorithm aredesigned to work with the proposed R-D- λ model, like the newupdate scheme, it is hard to test and analyze those modulesseparately. Therefore, the module-level analysis is conductedin an incremental way. The proposed algorithm is separatedinto four phases, namely,1) Phase 1 : The proposed R-D- λ model in Equation (11)is used to replace the model used in [10] with the QP- λ relation in Equation (16). The model coefﬁcients areupdated using the update scheme proposed in [10].2) Phase 2 : The proposed hierarchical initialization schemeis implemented on top of

Phase 1 .3)

Phase 3 : The proposed rate allocation scheme is imple-mented on top of

Phase 2 .4)

Phase 4 : The proposed update scheme is used to replacethe update scheme proposed in [10] on top of

Phase 3 . Phase 4 is the proposed algorithm.Table VIII shows the performance of each phase of theproposed algorithm. The proposed R-D- λ model ( Phase 1 )provides the majority of the gain in objective quality basedcoding efﬁciency (BDBR-PSNR), which is 4.35%/5.44% forRA, 1.68%/3.62% for LDP and 1.78%/3.93% for LDB. Thegain caused by the proposed model is higher for RA thanLD because RA allows bi-directional referencing with moreavailable reference frames, which greatly increases the coding

TABLE VIII: Per-module Analysis of the Proposed Algorithm

HM-RC[1] as Anchor

Phase 1 Phase 2 Phase 3 Phase 4

RA BDBR-PSNR -4.35% -4.37% -4.49% -5.44%BDBR-SSIM -1.26% -1.48% -1.48% -2.16% ∆ T +1.70% +1.74% +1.70% +1.67%LDP BDBR-PSNR -1.68% -1.75% -3.27% -3.62%BDBR-SSIM -0.88% -1.41% -2.76% -3.13% ∆ T +1.10% +0.69% -0.08% +0.18%LDB BDBR-PSNR -1.78% -1.79% -3.54% -3.93%BDBR-SSIM -0.85% -1.34% -2.98% -3.42% ∆ T +0.99% +0.63% +0.01% -0.40% efﬁciency of the frames of a higher frame level. The gapbetween the bit rates of lossy encoding and lossless encodingis lower for RA than LD, so the proposed model brings ahigher gain for RA.In addition, the proposed algorithm results in a similargain in subjective quality based coding efﬁciency, though theparameters in the proposed algorithm was tuned using BDBR-PSNR, which proves that the proposed algorithm is effectivefor improving both the objective and subjective quality aftercompression.Table VIII also provides the results on the complexitychange caused by the proposed algorithm. An increase of 0.5%in complexity is observed on average for three conﬁgurations,which is usually within the range of measurement noise andtherefore can be considered negligible.V. C

ONCLUSION AND D ISCUSSION

In this paper, we propose a novel generalized R-D- λ modelto better model the relationship between rate, distortion and λ . Based on the new model, a novel average bit rate control(ABR) scheme for HEVC is designed, which includes hierar-chical initialization, LMS based update for model coefﬁcientsas well as amortization and smooth window joint rate alloca-tion. Experimental results of implementing the proposed algo-rithm into the HEVC reference software HM-16.9 shows thatthe proposed rate control algorithm is able to achieve the bestcoding efﬁciency among the state-of-the-art RC algorithms,which is only 3.67% and 0.07% worse than CQP for RA andLDB conﬁgurations and 0.3% better than CQP for LDP, whilerate control accuracy and encoding speed are hardly impacted.In future, we will keep investigating the following issues.The proposed algorithm is designed and tested using theHEVC common test condition, which uses a ﬁxed referencestructure and an ABR scheme. However, real-world applica-tions are usually much more complicated than that, whichmay require CBR, adaptive reference structure and someoptimizations on speciﬁc usages like screen content videosand videos of ultra high resolutions (4K, 8K and even higher).It will be valuable to optimize the proposed algorithm forvarious application scenarios respectively. In addition, someof the variables, e.g. the values in Equation (16), are set toﬁxed numbers in the proposed algorithm, which were selectedthrough tuning experiments. In fact, the selection of thosevalues is a chicken-and-egg problem. It will be beneﬁcial toexplore a way to interactively determine these values.R EFERENCES[1] L. Li, B. Li, H. Li, and C. W. Chen, “ λ -domain optimal bit allocationalgorithm for high efﬁciency video coding,” IEEE Transactions onCircuits and Systems for Video Technology , vol. 28, no. 1, pp. 130–142,2018.[2] G. J. Sullivan, J. Ohm, W.-J. Han, and T. Wiegand, “Overview of thehigh efﬁciency video coding (hevc) standard,”

Circuits and Systems forVideo Technology, IEEE Transactions on , vol. 22, no. 12, pp. 1649–1668,2012.[3] I. E. Richardson, “H. 264/mpeg-4 part 10 white paper,” , 2003.[4] J. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, and T. Wiegand, “Com-parison of the coding efﬁciency of video coding standards includinghigh efﬁciency video coding (hevc),”

Circuits and Systems for VideoTechnology, IEEE Transactions on , vol. 22, no. 12, pp. 1669–1684, 2012. [5] G. Bjontegaard, “Improvements of the bd-psnr model,” ITU-T SG16 Q ,vol. 6, p. 35, 2008.[6] S. Ma, W. Gao, and Y. Lu, “Rate-distortion analysis for h. 264/avcvideo coding and its application to rate control,”

Circuits and Systemsfor Video Technology, IEEE Transactions on , vol. 15, no. 12, pp. 1533–1544, 2005.[7] H. Choi, J. Nam, J. Yoo, D. Sim, and I. Bajic, “Rate control based onuniﬁed rq model for hevc,”

ITU-T SG16 Contribution, JCTVC-H0213 ,pp. 1–13, 2012.[8] M. Liu, Y. Guo, H. Li, and C. W. Chen, “Low-complexity rate controlbased on rho-domain model for scalable video coding.” in

ICIP , 2010,pp. 1277–1280.[9] S. Wang, S. Ma, S. Wang, D. Zhao, and W. Gao, “Rate-gop based ratecontrol for high efﬁciency video coding,”

IEEE Journal of selected topicsin signal processing , vol. 7, no. 6, pp. 1101–1111, 2013.[10] B. L. Li, H. Li, and J. Zhang, “ λ domain rate control algorithm forhigh efﬁciency video coding,” Image Processing, IEEE Transactions on ,vol. 23, no. 9, pp. 3841–3854, 2014.[11] J. Wen, M. Fang, M. Tang, and K. Wu, “R-(lambda) model basedimproved rate control for hevc with pre-encoding,” in

Data CompressionConference (DCC), 2015 . IEEE, 2015, pp. 53–62.[12] S. Li, M. Xu, Z. Wang, and X. Sun, “Optimal bit allocation for ctulevel rate control in hevc,”

IEEE Transactions on Circuits and Systemsfor Video Technology , vol. 27, no. 11, pp. 2409–2424, 2017.[13] J. Xie, L. Song, R. Xie, Z. Luo, and X. Wang, “Temporal dependent bitallocation scheme for rate control in hevc,” in

Signal Processing Systems(SiPS), 2015 IEEE Workshop on . IEEE, 2015, pp. 1–6.[14] M. Wang, K. N. Ngan, and H. Li, “Low-delay rate control for consistentquality using distortion-based lagrange multiplier,”

IEEE Transactionson Image Processing , vol. 25, no. 7, pp. 2943–2955, 2016.[15] J. He and F. Yang, “Efﬁcient frame-level bit allocation algorithm for h.265/hevc,”

IET Image Processing , vol. 11, no. 4, pp. 245–257, 2017.[16] H. Guo, C. Zhu, S. Li, and Y. Gao, “Optimal bit allocation at frame levelfor rate control in hevc,”

IEEE Transactions on Broadcasting , no. 99,pp. 1–12, 2018.[17] B. Lee, M. Kim, and T. Q. Nguyen, “A frame-level rate controlscheme based on texture and nontexture rate models for high efﬁciencyvideo coding,”

IEEE Transactions on Circuits and Systems for VideoTechnology , vol. 24, no. 3, pp. 465–479, 2014.[18] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for videocompression,”

Signal Processing Magazine, IEEE , vol. 15, no. 6, pp.74–90, 1998.[19] B. Li, D. Zhang, H. Li, and J. Xu, “Qp determination by lambda value,”in

JCTVC-I0426, 9th JCTVC Meeting, Geneva, Switzerland , 2012.[20] Z. He, Y. K. Kim, and S. K. Mitra, “Low-delay rate control for dct videocoding via ρ -domain source modeling,” Circuits and Systems for VideoTechnology, IEEE Transactions on , vol. 11, no. 8, pp. 928–940, 2001.[21] F. Song, C. Zhu, Y. Liu, and Y. Zhou, “A new gop level bit allocationmethod for hevc rate control,” in

Broadband Multimedia Systems andBroadcasting (BMSB), 2017 IEEE International Symposium on . IEEE,2017, pp. 1–4.[22] Y. Gong, S. Wan, K. Yang, H. R. Wu, and Y. Liu, “Temporal-layer-motivated lambda domain picture level rate control for random-accessconﬁguration in h. 265/hevc,”

IEEE Transactions on Circuits andSystems for Video Technology , vol. 29, no. 1, pp. 156–170, 2017.[23] M. Tang, X. Chen, J. Wen, and Y. Han, “Hadamard transform basedoptimized hevc video coding,”

IEEE Transactions on Circuits andSystems for Video Technology , 2018.[24] J. Garrett-Glaser, “A novel macroblock-tree algorithm for high-performance optimization of dependent video coding in h. 264/avc,”

Tech. Rep. , 2009.[25] T. Yang, C. Zhu, X. Fan, and Q. Peng, “Source distortion temporalpropagation model for motion compensated video coding optimization,”in . IEEE,2012, pp. 85–90.[26] A. Fiengo, G. Chierchia, M. Cagnazzo, and B. Pesquet-Popescu, “Rateallocation in predictive video coding using a convex optimizationframework,”

IEEE Transactions on Image Processing , vol. 26, no. 1,pp. 479–489, 2016.[27] M. Bichon, J. Le Tanou, M. Ropert, W. Hamidouche, and L. Morin,“Optimal adaptive quantization based on temporal distortion propagationmodel for hevc,”

IEEE Transactions on Image Processing , vol. 28,no. 11, pp. 5419–5434, 2019.[28] M. Ropert, J. Le Tanou, M. Bichon, and M. Blestel, “Rd spatio-temporaladaptive quantization based on temporal distortion backpropagation inhevc,” in . IEEE, 2017, pp. 1–6. [29] F. Bossen et al. , “Common test conditions and software referenceconﬁgurations,”

JCTVC-L1100 , vol. 12, 2013.[30] Y. Wang, M. Claypool, and R. Kinicki, “Impact of reference distancefor motion compensation prediction on video quality,” in

MultimediaComputing and Networking 2007 , vol. 6504. International Society forOptics and Photonics, 2007, p. 650405.[31] M. Karczewicz and X. Wang, “Jctvc-m0257, intra frame rate controlbased on satd,”

ISO/IEC JTC1/SC29 WG11 , 2013.[32] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli et al. , “Imagequality assessment: from error visibility to structural similarity,”

IEEEtransactions on image processing , vol. 13, no. 4, pp. 600–612, 2004.

Minhao Tang received the B.S. degree from theDepartment of Electronic Engineering, TsinghuaUniversity, Beijing, China, in 2014, and Ph.D de-gree from the Department of Computer Science andEngineering, Tsinghua University in 2019. Dr. Tangis a senior researcher at Media Lab of Tencent.Dr. Tang’s research interests include video en-coding and transcoding and high-quality panoramicvideos production. Dr. Tang was a nominee for the2016 IEEE Trans. CSVT Best Paper Award and2017 ICME 10K Best Paper Award. The high qualityvideo encoding/transcoding solution project Dr. Tang developed won 2016Frost & Sullivan best practice in Enabling Technology Leadership award.

Jiangtao Wen received the BS, MS and Ph.D.degrees with honors from Tsinghua University, Bei-jing, China, in 1992, 1994 and 1996 respectively, allin Electrical Engineering.Dr. Wen’s research focuses on multimedia com-munication over challenging networks and compu-tational photography. He has authored many widelyreferenced papers in related ﬁelds. Products de-ploying technologies that Dr. Wen developed arecurrently widely used worldwide. Dr. Wen holdsover 40 patents with numerous others pending. Dr.Wen is an Associate Editor for IEEE Transactions Circuits and Systems forVideo Technologies (CSVT). He is a recipient of the 2010 IEEE Trans. CSVTBest Paper Award and a nominee for the 2016 IEEE Trans CSVT Best PaperAward. Dr. Wen was elected a Fellow of the IEEE in 2011. He is the Directorof the Research Institute of the Internet of Things of Tsinghua University, anda Co-Director of the Ministry of Education Tsinghua-Microsoft Joint Lab ofMultimedia and Networking. Besides teaching and conducting research, Dr.Wen also invests in high technology companies as an angel investor.