High Efficiency Rate Control for Versatile Video Coding Based on Composite Cauchy Distribution
11 High Efficiency Rate Control for Versatile VideoCoding Based on Composite Cauchy Distribution
Yunhao Mao, Meng Wang, Shiqi Wang,
Member, IEEE, and Sam Kwong,
Fellow, IEEE
Abstract —In this work, we propose a novel rate controlalgorithm for Versatile Video Coding (VVC) standard basedon its distinct rate-distortion characteristics. By modelling thetransform coefficients with the composite Cauchy distribution,higher accuracy compared with traditional distributions hasbeen achieved. Based on the transform coefficient modelling,the theoretically derived R-Q and D-Q models which havebeen shown to deliver higher accuracy in characterizing RDcharacteristics for sequences with different content are incorpo-rated into the rate control process. Furthermore, to establish anadaptive bit allocation scheme, the dependency between differentlevels of frames is modelled by a dependency factor to describerelationship between the reference and to-be-coded frames. Giventhe derived R-Q and D-Q relationships, as well as the dependencyfactor, an adaptive bit allocation scheme is developed for optimalbits allocation. We implement the proposed algorithm on VVCTest Model (VTM) 3.0. Experiments show that due to proper bitallocation, for low delay configuration the proposed algorithmcan achieve 1.03 % BD-Rate saving compared with the defaultrate control algorithm and 2.96 % BD-Rate saving compared withfixed QP scheme. Moreover, 1.29 % BD-Rate saving and highercontrol accuracy have also been observed under the randomaccess configuration.
Index Terms —Versatile video coding, rate control, rate model,distortion model
I. I
NTRODUCTION W ITH the widespread of multimedia services, recentyears have witnessed an explosive increase of videodata, bringing grand challenges to video data management interms of storage and transmission. The video coding standardswhich have evolved for several decades from H.264/AVC [1],H.265/HEVC [2] to the emerging Versatile Video Coding(VVC) [3] standard, have been repeatedly proven to improvethe coding efficiency beyond the previous one. A series ofnovel video coding technologies have been investigated duringthe standardization of VVC, aiming at providing more efficientvideo compression solutions. To better adapt the characteristicsof high resolution videos, the size of coding tree unit (CTU)is enlarged to 128 ×
128 with the cooperation of more flexiblepartitions such as quad-tree, binary-tree and ternary-tree [4].Besides, enhanced intra and inter prediction technologies [5–9] are investigated to further remove the spatial and temporalredundancies. Moreover, multiple transform selection (MTS)is supported for better compacting residual energies [10] infrequency domain. Regarding quantization, dependent quanti-zation is adopted, which maps quantization candidates within
Y. Mao, M. Wang, S. Wang and S. Kwong are with Department of ComputerScience, City University of Hong Kong, Hong Kong, China, (e-mail: [email protected]; [email protected]; [email protected];[email protected]). one block into a trellis map. The path with the lowestrate-distortion (RD) cost is determined as final quantizationoutcomes [11].As an essential component of an encoder, rate control, whichhas been widely investigated since MPEG-2 [12], aims toprovide the best video quality with the constraint of bit-ratebudget. Rate control is crucial for real-application scenarios ofthe video codec with the regularization of the bit-rate. Gener-ally speaking, there are two main procedures in rate control:bit-rate allocation and coding parameter determination. Bit-rate allocation can be processed with three-levels: the groupof pictures (GOP) level, frame level, and CTU level. WithGOP level bit allocation, the encoder assigns available bitsto the to-be-encoded GOPs with the consideration of bufferoccupancy. In a GOP, bits are allocated to each frame basedon GOP structure [13] or pre-analyzed RD characteristics [14].In the literature, there are two ways to realize frame-level bitallocation: fixed ratio allocation [13] and adaptive ratio alloca-tion [14]. More specifically, fixed ratio bit allocation generallyutilizes a predefined ratio depending on frame structure andtarget bit-rate. In [14], the authors proposed an adaptive bit al-location algorithm for HEVC based on λ domain rate control.The adaptive bit allocation algorithms are mostly built on anRD model, and the bit-rate control is realized by modelling therelationship among the rate, distortion, and coding parameters,where the coding parameters could be the Lagrange multiplier λ , the quantization parameter QP (or quantization step size Q )and the percentage of zero coefficients ρ [15].Existing rate control algorithms attempt to exploit therelationship among QP , target bit-rate R and λ . However,most of them merely focus on establishing an elaboratelydesigned relationship between R and QP or R and λ . Inparticular, Q -domain rate control algorithms emphasize onthe importance of QP whereas ignoring the role of λ , whichis decisive in mode decision. Moreover, in the sense of λ -domain rate control, QP is no longer the most critical factor. λ -domain rate control shows the advantage over Q -domain ratecontrol in HEVC encoder, which collaborates well with moresophisticated mode selection schemes. Although λ plays animportant role in mode decision, the influence of λ on outputdistortion and bit-rate is still quite obscure. By contrast, QP influences both the mode decision and quantization outcomeswhich dominate coding distortions and bit-rate. This inspiresus to construct a new analytical framework incorporating with R , Q and λ to better capture the inner-connections among thesethree. For computational convenience, we employ quantizationstep size Q in the proposed model, which can be monotonouslymapped from QP . a r X i v : . [ c s . MM ] A ug The rate control philosophy in VVC inherits fromH.265/HEVC with minor modifications for attending the everincreasing SKIP coded blocks [16]. As more advanced tech-nologies are adopted in the VVC, the RD characteristics aswell as the QP and λ relationship become more flexible. Tofurther promote the rate control efficiency for VVC, in thispaper, we first propose to model the distribution of transformcoefficients with an improved discrete Cauchy distribution thatcould more accurately depict the behavior of transform coef-ficients. Subsequently, we explore a new relationship among Q , coding bits and distortions based on the discrete Cauchydistribution model. Moreover, an optimal bit allocation schemeat GOP-level and frame-level is proposed in an analyticalway by leveraging the reference dependencies in terms ofdistortions and coding bits. In this manner, better RD perfor-mance can be achieved with the proposed rate control scheme.Extensive experimental results show that the proposed schemecan achieve 1.03% and 1.29% BD-Rate savings compared withthe default rate control algorithm in VTM platform [16] inlow-delay B (LDB) and random-access (RA) configurations.II. R ELATED W ORKS
Existing rate control algorithms [13, 15, 17, 18] strive toachieve more precisely modelling of the relationships betweencoding parameters and bit-rate, with the aim of capturingthe RD characteristics in different video sequences. The mostintuitive way to obtain a robust relationship is to encode thesequence for multiple rounds with different QP s. However,this significantly elevates encoding complexity, making it im-practicable in one-pass or two-pass coding scenarios. Codingdistortion D is mainly introduced by quantization, and thenumber of output bits R is closely related to the entropy codingof quantized residuals. As such, it is feasible to model the RDbehavior according to the distribution of transform coefficients. A. Distribution of Transform Coefficients
In the literature, numerous models have been investigatedto model the distribution of transform coefficients. In [19],source codes are modelled with uniform distribution withineach quantization interval. Cooperating with hard quantizationprocess, a quadratic relationship between quantization step size Q and distortion D can be obtained as follows, D = Q . (1)However, it is widely acknowledged that coefficient distribu-tion may not be subject to the uniform distribution in realapplication scenarios, and such assumption only holds underhigh bit-rate conditions [20]. Besides, a series of classicaldistribution models such as Gaussian distribution, Laplaciandistribution and Cauchy distribution have been studied in theliterature [21–24]. Gaussian distribution reveals the advantagein parameter estimation but with poor accuracy in fitting actualdistribution [20, 25, 26]. Generalized Gaussian distributioncan properly model the coefficients distribution whereas theassociated controlling parameters are difficult to estimate.Laplacian distribution has been widely employed in video coding tasks, as it strikes an excellent trade-off between thefitting accuracy and computational complexity regarding theparameter estimation. In [25], Li et al. modelled residuals withLaplacian distribution and derived close-forms for R - Q and D - Q expression, by which a better λ is inferred for rate-distortionoptimization (RDO), bringing 1.60 dB gains on average interms of PSNR. In [26], a low-complexity rate distortionoptimized quantization (RDOQ) scheme is investigated basedon a hybrid Laplacian distribution modelling for HEVC.Moreover, Seo et al. [20] proposed a rate control algorithmbased on Laplacian distribution aiming at minimizing videoquality fluctuation. In [27], it was observed that Cauchydistribution can more accurately model the distribution of theAC coefficients than Laplacian distribution whereby a framelevel bit allocation scheme is investigated for H.264/AVC. B. Rate Control
In rate control, efforts have been devoted to establishing therelationship among QP , R and λ . These methods operate in ρ domain, Q domain and λ domain to regularize the codingbit-rate.Typically, ρ domain methods [15] assume a linear rela-tionship between coding bit-rate R and the percentage of zerocoefficients ρ , R = θ ( − ρ ) , (2)where θ is a parameter relevant to the video content. As such,a one-to-one mapping between R and QP can be derived withthe assistant of the intermediate ρ . Even though ρ -domainrate control could provide smoother output bit-rates and betterobjective quality, it was designed for H.263 targeting atcoping with fixed block size, which may impede its furtherapplications.In [17], a complexity-adjustable rate control schemebased on a reliable R - Q relationship was investigated forH.264/AVC. More specifically, a linear relationship between R and Q − is observed, R = Z · S ADQ + r h . (3)where S AD denotes the sum of absolute difference of themotion-compensated micro-block. Z and r h represent modelparameter and the number of header bits, respectively. Typi-cally, they are highly related to the slice type. Comparing withthe fixed QP configuration, this rate control algorithm achieves0.33 dB PSNR gain with negligible coding time increase.Regarding the λ domain rate control, the hyperbolic functionbased RD relationship, which is recognized to hold betterfitting accuracy [28] than the conventional exponential func-tion [29], is employed in HEVC [13]. The relationship between R and D can be formulated as follows, D ( R ) = U R − V , (4)where U and V are model parameters. Moreover, the RD cost J [30] can be described as, J = D + λ R . (5) When encoding a sequence, a set of coding parameters whichcan minimize J is preferable. To find the best bit-rate whichcan minimize J , the derivative of J with respect to R iscalculated and set to zero as follows, ∂ J ∂ R = ∂ D ∂ R + λ = . (6)With the combination of Eqn. (4), the relationship between λ and R can be obtained as follows, λ = − ∂ D ∂ R = µ R ϕ , (7)where µ and ϕ are model parameters which are closely relevantto video content. In [13], a parameter updating strategy isemployed, with which µ and ϕ can be updated synchronouslyin the coding process. In this manner, given the target bitrate, the corresponding λ can be obtained through the λ - R relationship in Eqn. (7). Moreover, the associated QP can bederived according to a linear transform with ln λ [31], QP = . · ln λ + . . (8)To further improve the performance, a λ -domain adaptive bitallocation scheme is investigated [14] for HEVC rate control.By exploring the inter frame dependency, two hypothesisesare raised, including the linear relationship regarding thedistortions between reference and current frames, and lowdependencies regarding the frame-level bits between referenceand current frames. Subsequently, an optimal bit allocationscheme cooperated with a predefined ratio is proved to bemore effective than fixed allocation ratio.In [16], a new parameter estimating strategy for λ domainrate control is proposed and adopted by VVC. The λ usedby the previous encoded frame at the same temporal layer isregarded as the optimal one for the current frame. As such,the RD relationship can be predicted according to specificRD point and corresponding slope λ . Though traditional λ domain rate control schemes adopted as a reference in VVCshow promising RD performance and stable output bit-rate,the R - λ and λ - Q relationship built upon parameter estimationmay not be able to fully adapt the properties of video contentwithout the thorough consideration of transform coefficients.Considering the fact that RD performance is highly relatedto transform coefficients, we propose a distribution based ratecontrol algorithm. The distribution of transform coefficientsis modelled with an improved discrete Cauchy distribution.Based on the proposed model, the R-Q and D-Q models thatare built upon the characteristics of the video content arederived for encoding parameter estimation.III. C AUCHY D ISTRIBUTION B ASED T RANSFORM C OEFFICIENT M ODELLING
In this section, we establish a new model that exhibits highaccuracy in characterizing the transform coefficients in VVC,serving foundation to describe the the relationship betweenR-D and coding parameters. It is widely acknowledged thatthe transform coefficients exhibit a symmetrical distributionwith peak at zero. Fig. 1 shows the distribution of thetransform coefficients of a typical B frame from sequence (a)(b)Fig. 1. Distribution of transform coefficients of a B frame in sequence“BasketballDrill” under LDB configuration. (a) Zero included; (b) Zeroexcluded. “BasketballDrill”, wherein the inclusion and exclusion of zeropoint are respectively illustrated. We can observe a symmetricdistribution with a peak locating at the zero point, and thedistribution decreases rapidly as the coefficients deviate fromzero. Such peaking at zero motivates us to develop a compositedistribution that models the zero and non-zero coefficientsseparately, in an effort to achieve higher fitting accuracy.Previous research [32] indicates that Cauchy distribution isefficient in approximating the distribution of DCT coefficients.In the proposed distribution, we adopt a composite modellingstrategy based upon the peaking zero and discrete Cauchydistribution for non-zero coefficients, ρ ( n ) = (cid:40) α n + β , n ∈ Z ∩ n (cid:44) , p , n = , (9)where α and β are distribution parameters. p is the proba-bility of zero coefficient, and n denotes the coefficient level.Considering that involving zero coefficients in the distributionmay cause a local minimum during parameter estimation, theproposed distribution typically excludes the inferences of zeropoint to ensure higher accuracy for non-zero parts.Since the sum of the proposed probability model equals toone, the inherent relationship between α and β can be derived TABLE IC
OMPARISON OF D K L
REGARDING L APLACIAN , C
AUCHY , AND THE P ROPOSED D ISCRETE C AUCHY DISTRIBUTION .Sequence Laplacian Cauchy ProposedBasketballDrill, QP=23 0.7923 0.3224 0.0591BasketballDrill, QP=28 2.0977 0.4552 0.0465BQMall, QP=23 0.1286 0.1067 0.0677BQMall, QP=28 0.8162 0.3144 0.0461 as follows, (cid:213) n ρ ( n ) = − p , n ∈ Z ∩ n (cid:44) , ∞ (cid:213) n = α n + β = − p ,α = ( − p ) · β · tanh ( β . · π )( β . · π − tanh ( β . · π )) . (10)In practical implementation, the parameter β is obtained bysearching within a given range, targeting at minimizing themean squared error between the modelled and actual distribu-tion of transform coefficients.We compare the proposed model with Laplacian distributionand traditional Cauchy distribution regarding the fitting ac-curacy where Kullback-Leibler (KL) divergency [33] is used.Given an actual coefficient distribution f r and statistical model f p , the associated KL divergency can be calculated as follows, D K L = (cid:213) i f r ( n ) log (cid:18) f r ( n ) f p ( n ) (cid:19) . (11)Video sequences “BasketballDrill” and “BQMall” are involvedin the analyses with LDB configuration. Transform coefficientsin the 16-th frames are extracted from those two sequences.The corresponding KL divergencies are shown in Table I. Itcan be observed that compared with the traditional distribu-tions, the proposed model achieves higher fitting accuracy fornon-zero parts, as the KL divergency between raw data and theproposed model is much lower than that of traditional models.Fig. 2 illustrates the comparisons among the three distributionmodels, and it can be noticed that the proposed model couldbetter handle the zero-level and non-zero coefficients.IV. R ATE AND D ISTORTION M ODELS
In this section, we develop an analytical framework toexplore the relationships among rate, distortion and codingparameters based upon the proposed composite coefficientdistribution model. In particular, the R - Q and D - Q modelsare developed, serving as the foundation of the proposed ratecontrol scheme. A. R-Q Model
Herein, we utilize hard-decision quantization to simulatethe dependent quantization process for simplicity [25]. Giventhe transform coefficient c and quantization step size Q , thequantization level l can be derived as, l = f loor (cid:18) cQ + γ (cid:19) , (12) where γ is the rounding offset which equals to for I-sliceand for B-slice and P-slice [34]. According to the coefficientdistribution model in Eqn. (9), the probability of the N -thquantization level can be calculated as follows, P N ( Q ) = (cid:98)( N + ) Q − γ Q (cid:99) (cid:213) n = (cid:98)( N · Q − γ Q )(cid:99) α n + β, P − N ( Q ) = (cid:98)− N · Q + γ Q (cid:99) (cid:213) n = (cid:98)(−( N + ) Q + γ Q )(cid:99) α n + β, P ( Q ) = − · L max (cid:213) N = P N ( Q ) , (13)where L max is the maximum quantization level and N isan integer number which ranges from 1 to L max . For theconvenience of calculation, definite integral can be used toapproximate P N as follows, P N ( Q ) = ∫ ( N + ) Q − γ QNQ − γ Q α x + β dx = α √ β (cid:20) arctan (cid:18) ( N + ) Q − γ Q √ β (cid:19) − arctan (cid:18) NQ − γ Q √ β (cid:19)(cid:21) . (14)The entropy of quantizated coefficients can be formulatedby [25], H ( Q ) = L max (cid:213) N = − L max − P N ( Q ) log P N ( Q ) = − P ( Q ) log P ( Q ) + · L max (cid:213) N = − P N ( Q ) log P N ( Q ) . (15)Herein, the H ( Q ) is a monotonically decreasing function with Q , as shown in Fig. 3.Subsequently, by performing the actual entropy coding, weexemplify the relationship between the estimated entropy andactual coding bits of five test sequences, as shown in Fig. 4.In particular, the coding information of the 16-th frame isextracted from these sequences, where the QP s are set to 23,28, 33 and 38. An approximate linear relationship between theestimated entropy and actual number of output coding bits canbe observed. As such, the coding bits of the current frame canbe estimated as, ˆ R ( Q ) = φ · H ( Q ) + ψ, (16)where the slope φ is characterized by the relationship betweenthe actual coding bits of residuals and entropy, and theintercept ψ is determined by the header bits of the currentframe. However, as these parameters cannot be obtained beforeencoding the current frame, we adopt a strategy to inferthem from the previously coded frame at the same level. Inparticular, φ = R p − r hp H ( Q p ) and ψ = r hp , (17) (a) (b) (c) (d)Fig. 2. Comparisons among actual, Laplacian, Cauchy and the proposed composite distribution for sequences “BasketballDrill” and “BQMall”. (a)”BasketballDrill”, QP = . (b) ”BasketballDrill”, QP = . (c) ”BQMall”, QP = . (d) ”BQMall”, QP = .Fig. 3. Illustration of the relationship between quantization step size andestimated entropy of residuals.Fig. 4. Illustration of the relationship between estimated entropy of residualsand coding bits (per-pixel). where R p denotes the actual output bits (per-pixel) of thepreviously coded frame. Analogously, Q p represents the cor-responding quantization step size of the previous frame, and r hp denotes the header bits of previous frame which is alsoevaluated in terms of bits per pixel. Given the target rate, thecorresponding QP is obtained by locating the corresponding Q that leads to the minimization between the frame-level targetbits ˆ R i and estimated encoding bits ˆ R ( Q ) . B. D-Q Model
Given the quantization step size Q , the quantization distor-tions in terms of mean square error (MSE) can be estimatedas follows, D ( Q ) = (cid:98)( Q − γ Q )(cid:99) (cid:213) n = (cid:98)− Q + γ Q (cid:99) n · α n + β, D N ( Q ) = (cid:98)( N + ) Q − γ Q (cid:99) (cid:213) n = (cid:98)( N · Q − γ Q )(cid:99) ( n − N · Q ) α n + β, D − N ( Q ) = (cid:98)− N · Q + γ Q (cid:99) (cid:213) n = (cid:98)(−( N + ) Q + γ Q )(cid:99) ( n + N · Q ) α n + β . (18)For simplicity, Eqn. (18) can be approximated by calculatingdefinite integral as follows, D ( Q ) = ∫ Q − γ Q −( Q − γ Q ) x · α x + β dx = α ( Q − γ Q ) − α (cid:112) β arctan ( Q − γ Q √ β ) , D N ( Q ) = ∫ ( N + ) Q − γ QNQ − γ Q ( x − NQ ) · α x + β dx = α Q + Ψ ( Q , N ) − Ψ ( Q , N ) , (19)where Ψ ( Q , N ) = α N Q − αβ √ β · arctan (cid:18) Q √ ββ + ( NQ − γ Q )( NQ + Q − γ Q ) (cid:19) , (20) Ψ ( Q , N ) = α NQ · ln (cid:18) ( NQ + Q − γ Q ) + β ( NQ − γ Q ) + β (cid:19) . (21)As such, the total distortion can be formulated as follows, D ( Q ) = D ( Q ) + · L max (cid:213) N = D N ( Q ) . (22)In Fig. 5, the relationship between Q and D ( Q ) is shown,which further verifies that D ( Q ) is a monotonically increasingfunction of Q .In real encoding scenarios, to compensate the influencesof loop filters, dependent quantization as well as the SKIP-coded blocks, the distortion ˆ D of the current frame can be Fig. 5. Relationship between Q and estimated distortion D . estimated with the adaptation of the distortion information ofthe previously coded frame as follows, ˆ D ( Q ) = D nsp D ( Q p ) · D ( Q ) · ( − P sp ) + P sp · D sp (23)Herein, Q p and D nsp represent quantization step size and thedistortion for non-SKIP coded blocks of the previously codedframe. For SKIP-coded blocks, we assume the associatedcoding bits are zero and the incurring distortion as D sp . P sp is the ratio of SKIP-coded blocks measured in terms of thepixels within the previously coded frame.V. T HE P ROPOSED R ATE C ONTROL
In this section, the rate control scheme is presented basedon the proposed ˆ R - Q and ˆ D - Q models. First, the bit allocationscheme regarding the GOP-level and frame-level is elaboratelydesigned wherein the inter-frame dependencies are compre-hensively investigated. Subsequently, we present the derivationof coding parameters given the target bit-rate. Finally, theinitialization and clipping strategy of coding parameters arediscussed. A. Bit Allocation1) GOP Level Bit Allocation:
Given the target bit-rate of asequence R tseq , the ideal output bits for each GOP are derivedas follows, R tgop = R tseq N GOP . (24)Here, N GOP denotes the number of GOPs in a sequence. Sincethe actual output bits may deviate from the target bits becauseof diversified video contents, we employ a sliding window [13]to flatten the output bits. In particular, the mechanism behindthe sliding window is that if the encoded frames consumemore bits, the target bits for the following GOPs within thesliding window will be decreased accordingly and vice versa.As such, the target bits for the g -th GOP can be derived as, R tgop g = R tgop − R cost − R tgop · N coded N SW , (25) (a)(b)Fig. 6. Frame structures of RA and LD in VVC [35]. (a) RA structure withGOP size equaling to 16. (b) LD structure with GOP size equaling to 4. where R cost denotes the cost of bits for all encoded frames,and N SW represents the size of the slide window. N coded isthe number of frames that have already been encoded.
2) Frame Level Bit Allocation:
Two typical GOP structuresin VVC are shown in Fig. 6 illustrating the hierarchicalreferencing relationship. Regarding the bit allocation at theframe level, the inter-frame dependencies are fully considered.More specifically, due to inter prediction in P and B-frames,there exists quality dependencies between the reference frameand the current to-be-coded frame. One widely accepted viewis that the frames in lower temporal layers ( i.e. level 0), whichmay have more significant influences to the subsequent codingframes, are eligible to be assigned with more coding bits. Inturn, less coding bits are assigned to the frames in highertemporal layers. As such, the importance of different framescan be discriminated according to the referencing relationshipas well as video content. In the literature, how reference framesaffect the to-be-coded frame [14, 36–38] have been intensivelyinvestigated, where a linear relationship regarding the codingdistortions of reference frame and current one is noticed.Moreover, the existing schemes are also typically developedbased on the strong assumption that the coding bits of thereference frame have negligible influence on the output bitsof the current frame. Considering that new coding tools havebeen adopted in VVC, in this paper, we revisit this problembased on new statistics collected in VTM-3.0 [3], in an effort toexplore the rate and distortion characteristics in the referenceframe and the current to-be-coded frame.As illustrated in Fig. 7, the quality of the reference frame in-fluences both the distortions and the coding bits of the current (a) (b)(c) (d)Fig. 7. Illustration of the rate and distortion dependencies between thereference frame and the current coded frame. The x-axis denotes the MSE ofthe reference frame. The left and the right y-axis represent the output codingbits (per-pixel) and the MSE of the current coded frame, respectively. (a)“BasketballDrill” (b) “ChinaSpeed” (c) “BQMall” (d) “RaceHorses”. frame. More specifically, four sequences are involved in theinvestigation under LDB configuration. For the current to-be-coded frame, the associated QP is fixed to 40. Meanwhile, the QP of the reference frame varies from 30 to 43, in an effortto generate references with different quality levels. We plotthe corresponding output bits and distortions of the currentframe with varying quality of the reference frame in Fig. 7.We can observe that the distortions and coding bits of thecurrent frame increase with the increment of the distortions inthe reference frame. Moreover, it is interesting to see that thedistortion increment of the reference frame leads to a linearaugmented distortion of the current frame, along with a flattrend when the distortion of the reference frame reaches acertain level. The output coding bits (per-pixel) of the currentframe varies smoothly when the reference frame is of highquality and increases sharply when the reference frame isseverely distorted. These observations are in contrast to theexisting models where only the distortion of the current to-be-coded frame is influenced by the quality of the referenceframe.Considering the influences of both distortion and codingbits, there exists an approximately linear relationship betweenthe distortion of reference frame and the RD cost of the currentframe, as shown in Fig. 8. As such, we define the dependencyfactor π ij between reference frame j and encoding frame i asfollows, π ij = dJ i dD j , (26)where J i denotes the RD cost of the encoding frame and D j represents the distortion of the reference frame.Typically, the total RD cost of a GOP is formulated as thesum of the RD cost of each frame. Generally speaking, the (a) (b)(c) (d)Fig. 8. Illustration of the relationship between the distortion of the referenceframe and the RD cost of the current coded frame. (a) “BasketballDrill” (b)“ChinaSpeed” (c) “BQMall” (d) “RaceHorses”. distortion and coding bits of each frame characterized by theEqn. (16) and Eqn. (23) are highly dependent on the distribu-tion parameter estimated, and in practice due to the chicken-egg-dilemma we could only use the statistics of the previousframe sharing the same level to estimate RD cost of the to-be-encoded frame. However, due to the influence of the referenceframe quality, the straightforward estimation of the distributionparameters may lead to inaccurate modelling of the RD cost.In particular, we assume the distortion of the reference framethat serves for the previous frame as D j p p , where j p belongs toprevious frame’s reference list. As such, the actual quality ofthe reference frame deviates from D j p p , leading to the biasedRD-cost estimated. To compensate for the RD cost differenceintroduced by quality fluctuation of the reference frames, theRD cost of each frame is formulated as the sum of internalRD cost J iin , external RD cost J iex and constant RD cost J ci .In particular, J iin is derived based on Eqn. (16) and Eqn.( 23),and J iex is incurred by difference between D j p p and distortionvalues of the reference frames within current GOP, such thatit can be represented as { ˆ D j ( Q j ) − D j p p } . J ci is brought bydifference between D j p p and distortion of the reference framesoutside the current GOP. In other words, it could be regardedas a constant value. As such, supposing there are N f framesin current GOP, the total RD cost of a GOP can be written as, J total = N f (cid:213) i = ( J iin + J iex + J ci ) = N f (cid:213) i = [( ˆ D i ( Q i ) + λ GOP ˆ R i ( Q i )) + (cid:213) j J iex ( ˆ D j ( Q j ) − D j p p ) + J ci ] , (27)where j is the index of reference list regarding the encodingframe. Q i and Q j denote the quantization step sizes of the current frame and reference frame j , respectively. As provedin Appendix, Eqn. (27) can be written as, J total = N f (cid:213) i = [( ˆ D i ( Q i ) + λ GOP ˆ R i ( Q i )) + (cid:213) k J kex ( ˆ D i ( Q i ) − D i p p ) + J ci ] = N f (cid:213) i = J i ( Q i ) . (28)Herein k is the index of frame list which uses current frame i as a reference and D i p p is distortion for the previous frame offrame i . J i is sum of the internal RD cost of a frame and itsinfluence on other frames. In order to minimize the total RDcost of a GOP of frames J total , we need to find the optimal Q i for individual frame. Considering J i is a function of Q i and Q i is an independent parameter, J i of frame i is independentfrom other frames’ QP s. To minimize J total which is the sumof J i , we need to minimize each J i individually. As such, wecompute the partial derivation of J i with respect to Q i , whichis set equaling to 0 as follows, ∂ J i ∂ Q i = (cid:18) ∂ ˆ D i ( Q i ) ∂ Q i + λ GOP ∂ ˆ R i ( Q i ) ∂ Q i (cid:19) + (cid:213) k ∂ J kex ( ˆ D i ( Q i ) − D i p p ) ∂ ˆ D i ( Q i ) · ∂ ˆ D i ( Q i ) ∂ Q i = . (29)According to the former analyses that there exists an ap-proximated linear relationship between the distortion of thereference frame and the RD cost of the current encoding frame,by integrating Eqn. (26) into Eqn. (29), we can obtain, ∂ J i ∂ Q i = (cid:18) ∂ ˆ D i ( Q i ) ∂ Q i + λ GOP ∂ ˆ R i ( Q i ) ∂ Q i (cid:19) + ∂ ˆ D i ( Q i ) ∂ Q i · (cid:213) k π ki = (cid:18) κ i · ∂ ˆ D i ( Q i ) ∂ Q i + λ GOP ∂ ˆ R i ( Q i ) ∂ Q i (cid:19) = , (30)where κ i is the influence factor, κ i = + (cid:213) k π ki . (31)The influence factor reveals the importance of a frame. Morespecifically, frames with higher κ i have greater impact on otherframes, deserving to be assigned with more coding bits. In thisoptimization problem, the whole GOP shares the same λ GOP , λ GOP = − κ i · ∂ ˆ D i ( Q i ) ∂ Q i ∂ ˆ R i ( Q i ) ∂ Q i (32)Here, we need to obtain derivatives of Eqn. (16) and Eqn. (23).However, the complex nature of Eqn. (15) and Eqn. (19)makes it difficult for us to obtain analytical R − Q and D − Q relationships. In [27], the hyperbolic function is used to modelCauchy distribution based R − Q and D − Q relationships.Inspired by this method, we obtain different combinations of { Q , ˆ R ( Q )} and { Q , ˆ D ( Q )} and model them with hyperbolicfunction. Derivatives of the two fitting models are used toapproximate derivatives of Eqn. (16) and Eqn. (23), which are Algorithm 1
Optimal bit allocation.
Input:
Frame i ’s QP candidate list QP i , QP i , . . . , QP i andthe corresponding RD model within the current GOP. Output:
Target bit-rate R ti for frame i in current GOP. for v from 1 to 7 doStep 1: Supposing level 1 frame is the j -th framewithin GOP and its v -th candidate QP is QP vj of whichthe corresponding quantization step size is Q vj . Slopesfor R-Q and D-Q curve at Q vj are ˆ R (cid:48) j ( Q vj ) and ˆ D (cid:48) j ( Q vj ) respectively. By denoting ˆ R (cid:48) j ( Q vj ) and ˆ D (cid:48) j ( Q vj ) as ˆ R (cid:48) j v and ˆ D (cid:48) j v , we can define λ vGOP as, λ vGOP = − κ j · ˆ D (cid:48) j v ˆ R (cid:48) j v . (33) Step 2:
Select optimal QP for frame i from its QPcandidate list: QP i , QP i , . . . , QP i . λ u = − κ i · ˆ D (cid:48) i u ˆ R (cid:48) i u , u from 1 to 7 . (34)We can obtain: u iv = min u | λ u − λ vGOP | . (35) QP u iv i is selected as the optimal QP of frame i andstored in a QP list. Step 3:
The v -th QP list can be written as: QP u v , QP u v , . . . , QP u Nf v N f . end forStep 4: Obtain target bits for each frame. Supposingcorresponding quantization step size of the v -th QPlist is Q u v , Q u v , . . . , Q u Nf v N f . By combining Eqn. (16),the order of optimal QP list v o is obtained as, v o = min v |( N f (cid:213) i = ˆ R i ( Q u iv i ) − R tgop | , v from 1 to 7 . (36)Bits allocated to frame i is given by, R ti = ˆ R i ( Q u ivo i ) . (37)denoted as ˆ R (cid:48) ( Q ) and ˆ D (cid:48) ( Q ) . For frame i , the associated QPcandidates are from QP ip − to QP ip + , where QP ip denotesthe QP used to encode previous frame. Given the derivativesof R − Q and D − Q , we utilize Algorithm 1 to search allocatedbits to each frame to ensure the optimal RD performance aswell as the satisfaction of the bit-rate budget. B. Coding Parameters Derivation
After obtaining the target bit-rate R ti , the coding parameters λ i and QP i can be derived according to Eqn. (16). Given theQP candidate list of frame i , the quantization step Q i can be calculated as, Q i = min Q | ˆ R i ( Q ) − R ti | , (38) QP ∈ { QP ip − , QP ip − , . . . , QP ip + } where Q is the corresponding quantization step size of QP .Theoretically, λ is the slope of RD curve, which can bederived as, λ ti ( Q ) = − ∂ D i ∂ R i = − ∂ ( ˆ D i + D e ) ∂ Q ∂ ( ˆ R i + R e ) ∂ Q = − ˆ D (cid:48) i ( Q ) ˆ R (cid:48) i ( Q ) , (39)where D e and R e denote difference of distortion and bit-rate incurred by the discrepance of reference frame qualitywhich could be regarded as constant parameters. Moreover,we collect the coding information of three previous frames toensure a stable Q − λ relationship. Let { Q mp , λ mp } denote thequantization step size and λ of the m -th previous frame on thesame level, the stability is given by, Γ m = λ mp λ ti ( Q mp ) , ≤ m ≤ (40)More specifically, the value of Γ m closing to 1 indicates thatthe derived Q − λ relationship from Eqn. (39) is stabilized. Γ m is further used to scale λ ti ( Q i ) , such that λ i can be obtainedas, λ i = (cid:205) m = τ m · Γ m (cid:205) m = τ m · λ ti ( Q i ) , (41)Here, τ m is a predefined parameter of which the value is 5, 3,1 for m equaling to 1, 2, 3 respectively. C. Initial Value and Parameter Clip
The proposed rate control scheme is applied on P and Bslices. In practical implementation, the first frame of each levelis coded with default rate control algorithm. For the first 32frames, a fixed-ratio bit allocation scheme is applied to trainstable coding parameters for adaptive bit allocation. Regardingbit allocation under RA structure, we assume that frames in thesame temporal level share the identical influence factor κ i . Theexplicit values of κ i are shown in Table II. LD configurationinvolves simpler reference relationship and smaller GOP size,such that the influence factor is more sensitive to the codingbits. We define four sets of influence factor for each frame inLD configuration according to bit-per-pixel (bpp), as shownin Table III, where I G is an integer larger than zero. To caterthe original GOP structure, we add extra restrictions to QP asillustrated in Table IV and Table V. The QP ( z ) p indicates theQP of the previous encoded frame at z -th frame level.VI. E XPERIMENTAL R ESULTS
The proposed rate control algorithm is implemented on theVVC test model VTM-3.0 [39]. Extensive experiments areconducted to verify the effectiveness of the proposed methodconforming to the common test conditions (CTCs) [40] under
TABLE III
NFLUENCE F ACTOR FOR
RAFrame Level Influence Factor1 5.40822 2.39583 1.59334 1.15665 1TABLE IIII
NFLUENCE F ACTOR FOR
LDBPOC ID · I G − · I G − · I G − · I G < bpp ≤ < bpp ≤ < bpp ≤ < bpp ≤ LDB (GOP size = 4) and RA (GOP size = 16) configurations.QPs are set to 22, 27, 32 and 37. Details of recommendedtest sequences are summarized in Table VI. Experiments areexecuted on a dual Intel Xeon CPU E5-2620 platform withoutparallelism. We employ the original VTM-3.0 without ratecontrol to encode test sequences following the CTCs, andregard the output bit-rate as the target bit-rate for rate control.The compression performance is measured with BD-Rate [41]where negative BD-Rate denotes the performance improve-ment. In addition, the bit-rate error
BitErr is calculated tomeasure the rate control accuracy as follows,
BitErr = | R o − R t | R t × , (42)where R t denotes the target bit-rate, and R o is the correspond-ing output bit-rate. A. Results and Analyses
Table VII shows the coding performance of proposed ratecontrol algorithm under LDB and RA configurations. Theoriginal VTM-3.0 anchor without rate control (fixed-QP) andthe default frame-level rate control algorithm in VTM-3.0are respectively employed as the benchmark for comparison.As required by [40], class D is excluded from the overallaverage. In particular, compared with the default rate controlalgorithm, the proposed scheme brings . and . BD-Rate savings on average under LDB and RA configurations,respectively. Moreover, superior coding performance can beachieved on high resolution videos, as more valid samplesare provided for modelling, leading to higher fitting accuracy.Moreover, when compared with the fixed-QP coding scheme,the proposed rate control scheme brings 2.96% BD-Ratesavings under LDB configuration and 4.36% BD-Rate lossunder RA configuration. It is worthy to mention that bothof the proposed and the default rate control algorithms arecapable of improving the coding performance under LDBconfiguration. The proposed rate control scheme could providemore efficient coding parameters, leading to further improve-ment of coding gains. However, the rate control may degrade TABLE IVQP C
LIPS FOR
LD C
ONFIGURATION
Frame Level Lower Bound Upper Bound3 QP ( ) p -2 QP ( ) p QP ( ) p QP ( ) p − TABLE VQP C
LIPS FOR
RA C
ONFIGURATION
Frame Level Lower Bound Upper Bound5 QP ( ) p QP ( ) p + QP ( ) p QP ( ) p + QP ( ) p QP ( ) p + QP ( ) p QP ( ) p + QP ( ) p − QP ( ) p − the RD performance under RA configuration compared withthe fixed-QP coding. Furthermore, we exemplified RD curvesof sequence “RaceHorses” from class C in Fig. 9 from whichthe RD performance improvement brought by the proposedalgorithm can be observed. The encoding complexity of theproposed rate control scheme is tabulated in the last row ofTable VII. The proposed algorithm moderately increases thecomputational complexity by around 20% compared with thedefault rate control algorithm and the original anchor.Table VIII illustrates the average bit-rate error of the pro-posed rate control and the default rate control under LDBand RA configurations where the proposed scheme achieveslower bit-rate error. Moreover, the bit-rate errors regarding testsequences “BasketballDrive” and “BQMall” with respect todifferent target bit-rates under LDB and RA configurations areshown in Table IX and Table X. Compared with the defaultrate control algorithm, the propose rate control achieves sub-stantially smaller bit-rate error under RA configuration withvaried target bit-rates. Moreover, for LDB configuration, asimilar level of the bit-rate error regarding the default ratecontrol and the proposed rate control can be observed.To further demonstrate the benefits of the proposed method,the PSNR and the output bit-rate of individual frame in se-quence “BasketballDrill” are extracted under RA configurationwhere the target bit-rate is set to 2856 kbps. We illustrate theinstant PSNR and the output bit-rate from POC 60 to POC92 in Fig. 11 with the cooperation of the default rate controlscheme and the proposed scheme.It can be observed that the proposed rate control scheme re-veals a similar trend to the default scheme regarding the outputcoding bits in varied frames, wherein the key frames such asPOC 64 and POC 80 could enjoy more bits. Moreover, owingto the proper bit allocation, the proposed scheme achievessuperior PSNR performance compared with the default ratecontrol scheme, especially in terms of the key frames, leadingto overall performance improvement. Fig. 10 illustrates theoutput bits by per-second for sequence “RitualDance” underLDB and RA configurations, where the associated target bit-rate is set to 2876 kbps and 2467 kbps. Compared with the TABLE VIC
HARACTERISTICS OF T EST S EQUENCES
Class Number of Resolution Frame BitSequences Rate DepthA1 3 4K 60&30 10A2 3 4K 60&50 10B 5 1080p 60&50 8&10C 4 WVGA 60&50&30 8D 4 WQVGA 60&50&30 8E 3 720p 60 8(a)(b)Fig. 9. The RD curves of sequence “RaceHorses” (Class C) under LDB andRA configurations. (a) LDB configuration; (b) RA configuration. default rate-control algorithm, the output bit-rates are morestable when employing the proposed rate control schemes.VII. C
ONCLUSION
In this paper, we propose a novel rate control algorithmfor VVC based on an improved Cauchy distribution, whichachieves superior compression performance compared withthe default frame-level rate control algorithm in VTM-3.0.Based on the proposed distribution model, we theoreticallyderive R-Q and D-Q models which are demonstrated to realizehigher modelling accuracy regarding the RD characteristics ofdiversified video contents. Furthermore, we explore the framedependency between different temporal layers, with which an TABLE VIII
LLUSTRATION OF THE
BD-R
ATE OF THE P ROPOSED R ATE C ONTROL S CHEME ON
VTM-3.0
UNDER
LDB
AND
RA C
ONFIGURATIONS
LDB RAFixed-QP Default Fixed-QP Defaultas anchor as anchor as anchor as anchorClass A1 - - 9.93% -3.03%Class A2 - - 3.49% -0.15%Class B -3.58% -1.24% 3.76% -0.91%Class C -3.40% -0.48% 1.58% -1.32%Class D -1.43% -0.08% 3.30% -1.16%Class E -1.32% -1.43% - -
Overall -2.96% -1.03% 4.36% -1.29%
Enc. time 125% 123% 121% 118%(a)(b)Fig. 10. Illustration of the actual bits per-second for “RitualDance”. (a)LDB configuration where the target bit-rate is set as 2876 kbps. (b) RAconfiguration where the target bit-rate is set as 2467 kbps. adaptive bit allocation scheme is established for optimal bitallocation. Compared with the VVC rate control algorithm,owing to proper bit allocation and accurate Q- λ relationship,the proposed algorithm can achieve 1.03% BD-Rate savingsunder LDB configuration and 1.29% BD-Rate savings underRA configuration. Moreover, with LDB configuration, theproposed algorithm outperforms the fixed-QP coding scheme,where 2.96% BD-Rate savings can be achieved. These resultsprovide meaningful evidence regarding the effectiveness of theproposed rate control algorithm. TABLE VIIII
LLUSTRATION OF THE A VERAGE B IT - RATE E RROR OF THE D EFAULT R ATE C ONTROL AND THE P ROPOSED R ATE C ONTROL S CHEMES ON
VTM-3.0
UNDER
LDB
AND
RA C
ONFIGURATIONS
LDB RAProposed 0.3543% 2.177%Default 0.4158% 2.635%(a)(b)Fig. 11. Illustration of the instant coding bits and PSNR of sequence“BaksteballDrill” with the proposed rate control scheme and the default ratecontrol scheme from POC 60 to POC 92 under RA configuration. The targetbit-rate is 2856 kbps. (a) Bit cost; (b) PSNR. A PPENDIX P ROOF OF E QN . (28)According to Eqn. (27), we assume, N f (cid:213) i = J iex = N f (cid:213) i = (cid:213) j J iex ( ˆ D j ( Q j ) − D j p p ) , (43)where j is the index of reference list regarding the frame i . Q j denotes the quantization step size of the reference frame j . We set A ij = J iex ( ˆ D j ( Q j ) − D j p p ) , (44)where A ij means external RD cost of frame i , which is causedfrom frame j ’s fluctuation. Then we expand j to the whole TABLE IXE
XPERIMENTAL R ESULTS OF “B ASKETBALL D RIVE ” AND “BQM
ALL ” UNDER
LDB C
ONFIGURATION
Sequence Target Bit-rate Default Rate Control Algorithm Proposed Rate Control AlgorithmBit-rate Y-PSNR Bit-rate Error Bit-rate Y-PSNR Bit-rate ErrorBasketballDrive 17189.78 17185.16 39.4888 0.027% 17167.99 39.5838 0.127%5487.445 5490.41 37.6964 0.054% 5488.855 37.731 0.026%2605.99 2608.773 35.901 0.107% 2607.83 35.9127 0.071%1359.594 1361.379 33.948 0.131% 1359.434 33.9544 0.012%BQMall 3586.56 3590.396 40.3463 0.107% 3588.333 40.3821 0.049%1565.79 1569.038 37.5914 0.207% 1568.903 37.6226 0.199%771.25 773.8624 34.8017 0.338% 773.0672 34.8623 0.235%394.17 396.492 32.0209 0.590% 397.0336 32.1024 0.728%TABLE XE
XPERIMENTAL R ESULTS OF “B ASKETBALL D RIVE ” AND “BQM
ALL ” UNDER
RA C
ONFIGURATION
Sequence Target Bit-rate Default Rate Control Algorithm Proposed Rate Control AlgorithmBit-rate Y-PSNR Bit-rate Error Bit-rate Y-PSNR Bit-rate ErrorBasketballDrive 14299.21 14303.8 39.4227 0.032% 14297.89 39.4029 0.009%4625.193 4628.515 37.7323 0.072% 4625.287 37.7437 0.002%2185.733 2203.797 35.9003 0.826% 2188.247 36.0319 0.115%1102.946 1248.451 33.8775 13.192% 1123.8808 34.0671 1.898%BQMall 2894.882 2902.675 40.4221 0.269% 2896.334 40.4188 0.050%1293.245 1297.324 37.9194 0.315% 1294.236 37.9512 0.077%650.956 661.2928 35.36 1.588% 653.367 35.3822 0.370%334.54 352.3384 32.5389 5.320% 349.750 32.8307 4.546%
GOP. By setting A ij equaling to zero, if frame j is not inframe i ’s reference list, Eqn. (43) can be written as, N f (cid:213) i = J iex = N f (cid:213) i = N f (cid:213) j = A ij = (cid:213) A . . . A N f ... . . . ... A N f . . . A N f N f N f × N f (45) = N f (cid:213) j = ( A j + A j + · · · + A N f j ) = N f (cid:213) j = N f (cid:213) i = A ij . Based on our assumption, if frame j is not in frame i ’sreference list, A ij equals to zero. Eqn. (45) can be written as, N f (cid:213) i = J iex = N f (cid:213) j = N f (cid:213) i = A ij = N f (cid:213) j = (cid:213) k A kj (46) = N f (cid:213) j = (cid:213) k J kex ( ˆ D j ( Q j ) − D j p p ) . Herein, k is the list of frames which employ frame j asreference frame. R EFERENCES [1] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra,“Overview of the h.264/avc video coding standard,”
IEEETransactions on Circuits and Systems for Video Technology ,vol. 13, no. 7, pp. 560–576, 2003.[2] G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overviewof the high efficiency video coding (hevc) standard,”
IEEETransactions on Circuits and Systems for Video Technology ,vol. 22, no. 12, pp. 1649–1668, 2012.[3] B. Bross, J. Chen, and S. Liu, “Versatile video coding (draft3),”
JVET L1001 v9 , Oct. 2018. [4] X. Li, H.-C. Chuang, J. Chen, M. Karczewicz, L. Zhang,X. Zhao, and A. Said, “Multi-type-tree,”
Joint Video ExplorationTeam (JVET), doc. JVET-D0117 , 2016.[5] L. Zhang, K. Zhang, H. Liu, H. C. Chuang, Y. Wang, J. Xu,P. Zhao, and D. Hong, “History-based motion vector predictionin versatile video coding,” in , 2019, pp. 43–52.[6] S. De-Luxà ˛an-Hernà ˛andez, V. George, J. Ma, T. Nguyen,H. Schwarz, D. Marpe, and T. Wiegand, “An intra subpartitioncoding mode for vvc,” in , 2019, pp. 1203–1207.[7] K. Zhang, Y. Chen, L. Zhang, W. Chien, and M. Karczewicz,“An improved framework of affine motion compensation invideo coding,”
IEEE Transactions on Image Processing , vol. 28,no. 3, pp. 1456–1469, 2019.[8] H. Liu, L. Zhang, K. Zhang, J. Xu, Y. Wang, J. Luo, andY. He, “Adaptive motion vector resolution for affine-inter modecoding,” in , 2019, pp.1–4.[9] L. Zhao, X. Zhao, S. Liu, X. Li, J. Lainema, G. Rath, F. Urban,and F. RacapÃl’, “Wide angular intra prediction for versatilevideo coding,” in ,2019, pp. 53–62.[10] X. Zhao, J. Chen, M. Karczewicz, A. Said, and V. Sere-gin, “Joint separable and non-separable transforms for next-generation video coding,”
IEEE Transactions on Image Pro-cessing , vol. 27, no. 5, pp. 2514–2525, 2018.[11] H. Schwarz, T. Nguyen, D. Marpe, and T. Wiegand, “Hybridvideo coding with trellis-coded quantization,” in , March 2019, pp. 182–191.[12] “Coded representation of picture and audio information-mpeg-2test model 5,”
ISO-IEC AVC-491 , Apr. 1993.[13] B. Li, H. Li, L. Li, and J. Zhang, “ λ -domain rate controlalgorithm for high efficiency video coding,” IEEE Transactionson Image Processing , vol. 23, no. 9, pp. 3841–3854, 2014.[14] L. Li, B. Li, H. Li, and C. W. Chen, “ λ -domain optimal bitallocation algorithm for high efficiency video coding,” IEEETransactions on Circuits and Systems for Video Technology , vol. 28, no. 1, pp. 130–142, 2018.[15] Z. He, Y. Kim, and S. K. Mitra, “Low-delay rate controlfor dct video coding via ρ -domain source modeling,” IEEETransactions on Circuits and Systems for Video Technology ,vol. 11, no. 8, pp. 928–940, 2001.[16] Y. Li and Z. Chen, “Rate control for vvc,”
JVET K0390 , Jul.2018.[17] S. Ma, W. Gao, and Y. Lu, “Rate-distortion analysis forh.264/avc video coding and its application to rate control,”
IEEETransactions on Circuits and Systems for Video Technology ,vol. 15, no. 12, pp. 1533–1544, 2005.[18] Z. Chen and X. Pan, “An optimized rate control for low-delayh.265/hevc,”
IEEE Transactions on Image Processing , vol. 28,no. 9, pp. 4541–4552, 2019.[19] H. Gish and J. Pierce, “Asymptotically efficient quantizing,”
IEEE Transactions on Information Theory , vol. 14, no. 5, pp.676–683, 1968.[20] C. Seo, J. Moon, and J. Han, “Rate control for consistent objec-tive quality in high efficiency video coding,”
IEEE Transactionson Image Processing , vol. 22, no. 6, pp. 2442–2454, 2013.[21] F. Müller, “Distribution shape of two-dimensional dct coeffi-cients of natural images,”
Electronics Letters , vol. 29, no. 22,pp. 1935–1936, 1993.[22] T. Eude, R. Grisel, H. Cherifi, and R. Debrie, “On the distri-bution of the dct coefficients,” in
Proceedings of ICASSP ’94.IEEE International Conference on Acoustics, Speech and SignalProcessing , vol. v, 1994, pp. V/365–V/368 vol.5.[23] E. Y. Lam and J. W. Goodman, “A mathematical analysis of thedct coefficient distributions for images,”
IEEE Transactions onImage Processing , vol. 9, no. 10, pp. 1661–1666, 2000.[24] E. Yang, X. Yu, J. Meng, and C. Sun, “Transparent compositemodel for dct coefficients: Design and analysis,”
IEEE Trans-actions on Image Processing , vol. 23, no. 3, pp. 1303–1316,2014.[25] X. Li, N. Oertel, A. Hutter, and A. Kaup, “Laplace distributionbased lagrangian rate distortion optimization for hybrid videocoding,”
IEEE Transactions on Circuits and Systems for VideoTechnology , vol. 19, no. 2, pp. 193–205, 2009.[26] J. Cui, S. Wang, S. Wang, X. Zhang, S. Ma, and W. Gao, “Hy-brid laplace distribution-based low complexity rate-distortionoptimized quantization,”
IEEE Transactions on Image Process-ing , vol. 26, no. 8, pp. 3802–3816, 2017.[27] N. Kamaci, Y. Altunbasak, and R. M. Mersereau, “Frame bit al-location for the h.264/avc video coder via cauchy-density-basedrate and distortion models,”
IEEE Transactions on Circuits andSystems for Video Technology , vol. 15, no. 8, pp. 994–1006,2005.[28] M. R. Ardestani, A. A. B. Shirazi, and M. R. Hashemi, “Rate-distortion modeling for scalable video coding,” in , 2010, pp.923–928.[29] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization forvideo compression,”
IEEE Signal Processing Magazine , vol. 15,no. 6, pp. 74–90, 1998.[30] H. Everett III, “Generalized lagrange multiplier method for solv-ing problems of optimum allocation of resources,”
Operationsresearch , vol. 11, no. 3, pp. 399–417, 1963.[31] B. Li, J. Xu, D. Zhang, and H. Li, “Qp refinement accordingto lagrange multiplier for high efficiency video coding,” in , 2013, pp. 477–480.[32] Y. Altunbasak and N. Kamaci, “An analysis of the dct coefficientdistribution with the h.264 video coder,” in ,vol. 3, 2004, pp. iii–177.[33] Kullback, Solomon, Leibler, and R. A, “On information andsufficiency,”
The annals of mathematical statistics , vol. 22,no. 1, pp. 79–86, 1951.[34] G. J. Sullivan, “Adaptive quantization encoding technique using an equal expected-value rule,”
Joint Video Team of ISO/IEC andITU-T, doc. JVT-N011 , Jan. 2005.[35] H. SCHWARZ, “Hierarchical b pictures,”
Joint Video Team(JVT) of ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JVT-P014 , 2005.[36] S. Hu, H. Wang, S. Kwong, T. Zhao, and C. . J. Kuo, “Ratecontrol optimization for temporal-layer scalable video coding,”
IEEE Transactions on Circuits and Systems for Video Technol-ogy , vol. 21, no. 8, pp. 1152–1162, 2011.[37] S. Wang, S. Ma, S. Wang, D. Zhao, and W. Gao, “Rate-gopbased rate control for high efficiency video coding,”
IEEEJournal of Selected Topics in Signal Processing , vol. 7, no. 6,pp. 1101–1111, 2013.[38] J. He, E. Yang, F. Yang, and K. Yang, “Adaptive quantizationparameter selection for h.265/hevc by employing inter-framedependency,”
IEEE Transactions on Circuits and Systems forVideo Technology , vol. 28, no. 12, pp. 3424–3436, 2018.[39] “Vvc software vtm-3.0,” https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/tags/VTM-3.0/.[40] F. Bossen, J. Boyce, K. Suehring, X. Li, and V. Seregin, “Jvetcommon test conditions and software reference configurationsfor sdr video,”
JVET L1010 , Oct. 2018.[41] G. Bjontegaard, “Improvements of the bd-psnr model,”