[PDF] Low Complexity Trellis-Coded Quantization in Versatile Video Coding

Abstract

The forthcoming Versatile Video Coding (VVC) standard adopts the trellis-coded quantization, which leverages the delicate trellis graph to map the quantization candidates within one block into the optimal path. Despite the high compression efficiency, the complex trellis search with soft decision quantization may hinder the applications due to high complexity and low throughput capacity. To reduce the complexity, in this paper, we propose a low complexity trellis-coded quantization scheme in a scientifically sound way with theoretical modeling of the rate and distortion. As such, the trellis departure point can be adaptively adjusted, and unnecessarily visited branches are accordingly pruned, leading to the shrink of total trellis stages and simplification of transition branches. Extensive experimental results on the VVC test model show that the proposed scheme is effective in reducing the encoding complexity by 11% and 5% with all intra and random access configurations, respectively, at the cost of only 0.11% and 0.05% BD-Rate increase. Meanwhile, on average 24% and 27% quantization time savings can be achieved under all intra and random access configurations. Due to the excellent performance, the VVC test model has adopted one implementation of the proposed scheme.

Full PDF

11 Low Complexity Trellis-Coded Quantization inVersatile Video Coding

Meng Wang, Shiqi Wang,

Member, IEEE,

Junru Li, Li Zhang,

Member, IEEE,

Yue Wang, Siwei Ma,

Member, IEEE and Sam Kwong,

Fellow, IEEE

Abstract —The forthcoming Versatile Video Coding (VVC)standard adopts the trellis-coded quantization, which leveragesthe delicate trellis graph to map the quantization candidateswithin one block into the optimal path. Despite the high com-pression efﬁciency, the complex trellis search with soft decisionquantization may hinder the applications due to high complexityand low throughput capacity. To reduce the complexity, in thispaper, we propose a low complexity trellis-coded quantizationscheme in a scientiﬁcally sound way with theoretical modeling ofthe rate and distortion. As such, the trellis departure point canbe adaptively adjusted, and unnecessarily visited branches areaccordingly pruned, leading to the shrink of total trellis stagesand simpliﬁcation of transition branches. Extensive experimentalresults on the VVC test model show that the proposed scheme iseffective in reducing the encoding complexity by 11% and 5%with all intra and random access conﬁgurations, respectively, atthe cost of only 0.11% and 0.05% BD-Rate increase. Meanwhile,on average 24% and 27% quantization time savings can beachieved under all intra and random access conﬁgurations. Dueto the excellent performance, the VVC test model has adoptedone implementation of the proposed scheme.

Index Terms —Trellis-coded quantization, soft quantization,rate distortion optimization, VVC, video coding.

I. I

NTRODUCTION

Recent years have witnessed the rapid development ofvideo coding technologies. Newly adopted coding tools affordmore mode options to cope with various characteristics invideo sequences, leading to signiﬁcant improvement of codingefﬁciency. Video coding standards such as H.264/AVC [1],HEVC [2], AVS [3] and VVC [4], specify the semantic ofdecoding process, bestowing space for encoder optimizationand complexity reduction. To coordinate the behaviors ofindividual or multiple coding tools, rate distortion optimiza-tion (RDO) [5] is employed all over the encoding stages.Consequently, the optimal combination of encoding modesand parameters can be systematically determined by RDO,resulting in further promotion of compression performance.Given the maximum allowed rate R max , the aim of RDO isto minimize the encoding distortions by attempting different M. Wang, S. Wang and S. Kwong are with Department of Computer Sci-ence, City University of Hong Kong, Hong Kong, China, (e-mail: [email protected]; [email protected]; [email protected]). (

Corre-sponding author: Shiqi Wang )J. Li and S. Ma are with the Institute of Digital Media, Peking University,Beijing, China, (e-mail: [email protected]; [email protected]).L. Zhang is with the Bytedance Inc., San Diego CA. USA, (e-mail:[email protected]).Y. Wang is with the Bytedance (HK) Limited., Hong Kong, China, (e-mail:[email protected]). combinations of encoding parameters and modes from the set M , which can be described as follows, min M { D } subject to R ≤ R max . (1)Herein, Lagrangian optimization [6] is used to convert theconstrained problem in Eqn. (1) to an unconstrained one, min M { J } where J = D + λ · R , (2)where J is the RD cost and λ is the Lagrange multiplier. R isthe number of bits and D indicates the distortion. In general,genuine encoding procedures such as transform, quantization,entropy coding, inverse quantization and inverse transform areobbligato to obtain R and D for an individual mode. Further-more, numerous RD cost calculations shall be performed withdifferent candidate modes and combinations. This extremelyimposes heavy burdens in terms of computational complexityto the encoder, which may impede the implementation andapplications of new video coding standards in real applicationscenarios.Typically, there are three ways to economize the encodingcomputational complexity. The ﬁrst achieves the bottom-levelspeedup by employing the single-instruction multiple-data(SIMD) [7], which focuses on speciﬁc modules that are withdata-intensive operations, such as intra prediction, motioncompensation, transformation and ﬁltering. As a result, theoperation time consumed by each mode attempting can besaved by performing SIMD without any performance vari-ation. The second way alleviates the encoder burden basedon pruning the mode attempting, which is able to reducethe number of RDO rounds. More speciﬁcally, improbablemode candidates are inferred experimentally or theoretically,and are directly ignored by skipping RD cost calculation orcomparisons, leading to the savings of the overall encodingtime [8]. However, the remaining modes shall be evaluatedwith RDO. The third way focuses on decreasing the com-plexity of RD calculation wherein the rate and distortion areestimated, instead of exhaustively going through the tediousworking ﬂow [9, 10].The quantization, which is measured in terms of the good-ness of the reproduced signal compared to the original as wellas the resulting representation cost, evolves rapidly in videocoding standards. Soft decision quantization (SDQ) [11–13]introduces the sense of rate-distortion optimization to quan-tization level determination, which promotes the coding efﬁ-ciency and simultaneously raises the computational complex-ity compared to the conventional hard decision quantization a r X i v : . [ c s . MM ] A ug (HDQ) [14]. During the standardization of H.264/AVC [1],HEVC [2] and AVS2 [3], a classical SDQ method, rate distor-tion optimized quantization (RDOQ) [13], was adopted anddesirable coding performance had been achieved. However,the computational complexity of RDOQ becomes the barriersince the entropy coding shall be performed for each candidatealong with context model updating. In VVC, besides RDOQ,trellis-coded quantization (TCQ) [15] is adopted, which isalso termed as dependent quantization. With TCQ, quanti-zation candidates are delicately deployed into trellis graphat block level cooperating with state transfer, in an effort toconvert the optimized quantization solution into the optimumpath searching task. As such, the statistical dependenciesamong quantizaion outcomes within one coded block canbe exploited. Different from the HDQ or RDOQ, where theformer only conducts the quantization without considering theinﬂuence of coding bits, and the latter pays attention to theoptimal RD behavior regarding the up-to-now coefﬁcient, TCQmaps the coefﬁcients to the trellis graph by employing vectorquantizer, and seeks the path with the minimum RD cost as theoptimal quantization solution. Superior coding performance isachieved by TCQ when compared to the RDOQ, where 3.5%and 2.4% bit-rate savings are reported under all intra (AI) andrandom access (RA) conﬁgurations, respectively, in the VTM-1.0 platform [16]. However, signiﬁcant encoding complexityincrease has also been observed, which attributes to the RDcalculation, accumulation and comparison for each stage andeach node during TCQ.There have been many researches study on reducing thecomplexity of the sophisticated quantization process. Huangand Chen [17] presented an analytical method to address theRDO-based quantization problem for H.264/AVC. In [10], thevariation of rate and distortion, ∆ R and ∆ D models were inves-tigated, where improbable quantization candidates in RDOQcan be efﬁciently excluded. In [18], transform coefﬁcients aremodeled with Laplacian distribution, which can be furtherutilized to deduce the block-level RD performance for RDOQ.A trellis-coded quantization method was studied in [12] forH.264/AVC, where all potential candidates along with codingcontexts are mapped into the trellis graph, leading to theimprovement of coding efﬁciency. However, the computationalcomplexity is extremely high regarding the optimal pathsearching for both software and hardware implementations.To tackle this problem, Yin et al. [9] proposed a fast softdecision quantization algorithm that discriminated safe orunsafe quantization levels based on the speculation of thevariations regarding rate and distortion.Though the previous research works are effective for lessen-ing the quantization complexities for H.264/AVC and HEVC,they are not applicable to the trellis-based quantization inVVC. Herein, we propose a low complexity TCQ scheme forVVC by modeling rate and distortion in a scientiﬁcally soundway. With the proposed model, the RD performance withregard to different quantization candidates can be effectivelyevaluated, and the computational intensive searching processin TCQ can be safely eliminated. In particular, the trellisdeparture point is adaptively determined, by which the totalnumber of trellis stages can be shrunk. Moreover, the branch pruning scheme is investigated based on the RD models, whichis conductive to decrease the operation complexity of TCQ.II. S TATISTICAL R ATE AND D ISTORTION M ODELS

In the literature, the rate and distortion models have beenstatistically established according to the probability distribu-tion of transform coefﬁcients [19–24]. The distribution oftransform coefﬁcients has been studied for several decades,including Laplacian distribution [21] [24], Cauthy distribu-tion [22], generalized Gaussian distribution [23] and combineddistribution [25]. Generalized Gaussian distribution revealsthe best modeling accuracy owing to the ﬂexible controllingparameters associated to shape and scale. However, the con-trolling parameters are difﬁcult to estimate, which signiﬁcantlyhinders its applications. In addition, Cauthy distribution maynot be appropriate for the RD modeling task in the encodersince the mean and variance are not converged. By contrast,Laplacian distribution was regarded as the optimal solutionfor compromising modeling complexity and accuracy [24]. Inthe literature, numerous rate and distortion models have beenproposed, and they can be further applied to rate control,bit allocation and fast mode selection. A block level rateestimation scheme was presented in [23] to speed up theRDO selection for H.264/AVC, where individual sub-bandof transform coefﬁcients is modeled with generalized Gaus-sian distributions. In [22], frame-level bits are approximatedand allocated by modeling the AC coefﬁcients with Cauthydistribution. Moreover, the rate model was established from ρ -domain [26] based on the assumption of Gaussian andLaplacian distribution, where a linear relationship betweenrate and the percentage of non-zero coefﬁcients was delicatelyderived. In [17, 27–29], the rate was also modeled with the (cid:96) -norm of quantization coefﬁcients.In this section, we develop the rate and distortion modelsdedicated to the sophisticated designed quantization in VVCwith Laplacian distribution. In particular, let C ( i ) s be thetransform coefﬁcient locating at position i in a coding blockwith size W × H , and the scalar quantization is given by, l ( i ) s = si g n ( C ( i ) s ) · (cid:36) | C ( i ) s | Q step + f (cid:37) , i ∈ [ , W × H − ] , (3)where l ( i ) s is the corresponding scalar quantized coefﬁcient.The parameter f is typically involved to control the roundingoffset, which is set to / during the pre-quantization processin VVC and HEVC soft quantization. Q step represents thequantization step size.Laplacian distribution is adopted here to model the trans-form residuals. In particular, the probability density function(PDF) of transform coefﬁcient C ( i ) s is given by, p ( x ) = Λ e − Λ ·| x | , (4)where Λ is the Laplacian parameter that can be determinedwith the standard deviation σ as follows, Λ = √ σ . (5) A. Relationship between Rate and (cid:96) -norm of Coefﬁcients In general, given a certain allowed distortion level D ( x , ˆ x ) = | x − ˆ x | , based on Shannon’s source coding theorem, theminimum bits for coding a symbol can be derived as, R ( D ) = log (cid:18) Λ · D (cid:19) . (6)As such, for a preset Q step , the associated distortion is givenby, D ( Q step ) = D ( Q step ) + D ¯0 ( Q step ) , (7)where D ( Q step ) = · ∫ ( − f )· Q step p ( x ) · xdx , (8) D ¯0 ( Q step ) = · ∞ (cid:213) l = ∫ ( l + − f )· Q step ( l − f )· Q step p ( x ) · | x − l · Q step | dx . (9)Herein, l represents the quantization level. As such, D ( Q step ) and D ¯0 ( Q step ) can be derived as follows, D ( Q step ) = − Q step · τ + Λ · ( − τ ) , (10) D ¯0 ( Q step ) = Q step · τ + Λ · τ − τ · ( − τ − τ − ) , (11)where τ = e − Λ Q step . Therefore, D ( Q step ) can be representedas, D ( Q step ) = Λ · (cid:20) − τ + τ · ( − τ − τ − ) − τ (cid:21) . (12)Given the PDF of the transform coefﬁcients, the percentage ofnon-zero quantized coefﬁcients can be estimated as follows, P nz = − ∫ ( − f )· Q step −( − f )· Q step Λ e − Λ | x | dx = τ. (13)Moreover, P nz can also be represented with (cid:96) -norm, P nz = L W × H , (14)where P nz should be within the range of [0,1] and L denotesthe (cid:96) -norm in a coding block. Furthermore, the relationshipbetween the coding bit and the percentage of non-zero quan-tized coefﬁcients can be obtained by substituting Eqn. (12)and Eqn. (13) into Eqn. (6) as follows, ˆ R = log (cid:18) + P nz − P nz (cid:19) . (15)The Taylor expansion of Eqn. (15) can be expressed as, ˆ R = ( ) P nz +

23 ln ( ) P nz +

25 ln ( ) P nz + . . . . (16)As such, we could have the following relationship, ˆ R = P nz · (cid:18) ( ) +

23 ln ( ) P nz +

25 ln ( ) P nz + . . . (cid:19) ≈ α · L . (17)The relationship is approximated to be locally linear withrespect to P nz , which corresponds to the ρ -domain model [26]. B. Relationship between Rate and (cid:96) -norm of Coefﬁcients Herein, we further model the rate from the perspective ofself-information [30]. In particular, the self-information of aquantized symbol l is given by, ˆ r = − log p ( l s = l ) , (18)where p ( l s = l ) denotes the probability of the scalar quantiza-tion result l s equaling to l . Given the quantization step Q step and the rounding offset f , p ( l s = l ) is represented as follows, p ( l s = l ) = (cid:40) · ∫ ( − f )· Q step p ( x ) dx , l = ∫ (| l | + − f )· Q step (| l |− f )· Q step p ( x ) dx , l (cid:44) . (19)By integrating the Laplacian distribution into Eqn. (19), theprobability of the quantized symbol can be expressed as, p ( l s = l ) = (cid:40) − e − Λ Q step l = (cid:16) e − Λ Q step (| l |− ) − e − Λ Q step (| l | + ) (cid:17) l (cid:44) . (20)With Eqn. (4), Eqn. (18) and Eqn. (20), the rate of thequantized symbol can be approximated. For the case of l = , ˆ R can be estimated as, ˆ R = − log ( − e − Λ Q step ) = β · | l | + b , (21)where β = , b = − log (cid:16) − e − Λ Q step (cid:17) . (22)For the case of l (cid:44) , ˆ R can be approximated as, ˆ R = − log (cid:20) (cid:16) e − Λ Q step (| l |− ) − e − Λ Q step (| l | + ) (cid:17)(cid:21) = Λ Q step · log ( e ) · | l | + − log (cid:16) e Λ Q step − e − Λ Q step (cid:17) = β · | l | + b , (23)where β = Λ Q step · log ( e ) , b = − log (cid:16) e Λ Q step − e − Λ Q step (cid:17) . (24)As such, the total coding bits of a coding block can beexpressed as, ˆ R = β · L + b , (25)where b represents the combination of b and b . In thisregard, the number of coding bits of a block is determinedby the (cid:96) -norm of the coefﬁcients. C. Rate modeling

The (cid:96) -norm and (cid:96) -norm of coefﬁcients play complemen-tary roles in approaching the number of coding bits, andmerely employing (cid:96) -norm or (cid:96) -norm may lead to the biasedapproximation. First, the low bit rate assumption made byEqn. (16) may not always hold. As illustrated in Fig. 1, when P nz is beyond 50%, corresponding to high bit rate coding, the Taylor ExpansionFirst OrderSecond OrderThird Order

Fig. 1. Illustration of Taylor expansion, ﬁrst order, second order and thirdorder approximation with respect to P nz .(a) (b) (c) (d)(e) (f) (g) (h)(i) (j) (k) (l)Fig. 2. Illustration of the actual coding bits R , (cid:96) -norm, (cid:96) -norm andestimated coding bits for the sequence “RaceHorses”. (a-d) R versus (cid:96) -normof quantized transform coefﬁcients L ; (e-h) R versus (cid:96) -norm of quantizedtransform coefﬁcients L ; (i-l) R versus estimated coding bits ˆ R . actual bits could be underestimated when adopting the (cid:96) -normonly. In addition, it can be noticed that (cid:96) -norm estimates therate in a statistical manner without considering the individuallevel of coefﬁcients. Obviously, larger coefﬁcients shouldconsume more coding bits, which cannot be well characterizedby (cid:96) -norm only. This also provides useful evidences regardingthe incorporation of (cid:96) -norm. On the other hand, (cid:96) -norm onlypays attention to the individual coding elements but ignoresthe dependencies and context in coefﬁcients coding process.Therefore, it is possible to equip the rate model with both (cid:96) -norm and (cid:96) -norm. Moreover, the position information,especially the location of the last non-zero coefﬁcients alsoinﬂuences the ﬁnal coding bits. As such, the ﬁnal rate modelis given by, ˆ R = α · L + β · L + γ · R LP + (cid:15), (26) where α , β , γ and (cid:15) denote the model parameters. In particular,the parameters α and β which control the relationship betweenrate and (cid:96) / (cid:96) -norms of the quantized coefﬁcients, highlyrely on the Q step and block sizes. L and L representthe (cid:96) -norm and (cid:96) -norm of the current CU, respectively. InVVC, individual coordinate of the last signiﬁcant coefﬁcient iscomposed of a preﬁx and sufﬁx, wherein the preﬁx is contextcoded with truncated unary bins and the sufﬁx is bypass codedwith ﬁxed length bins. Here, the coding bit of the coordinates ( x , y ) regarding the last non-zero coefﬁcient is represented by R LP , which is practically obtained by a look-up table. Therelationship between the actual coding bits R and the estimatedcoding bits ˆ R is illustrated in Fig. 2 with various QPs, showingthat the model delivers high accuracy in modeling the rate inVVC. D. Distortion modeling

To measure the quantization distortion, sum of square error(SSE) is adopted, and the SSE of a coding block can bestraightforwardly represented as follows, D = W × H − (cid:213) i = (cid:104) Q − (cid:16) l ( i ) s (cid:17) − C ( i ) s (cid:105) = W × H − (cid:213) i = (cid:20)(cid:16) Q step · l ( i ) s (cid:17) − · Q step · l ( i ) s · C ( i ) s + (cid:16) C ( i ) s (cid:17) (cid:21) , (27)where Q − (·) indicates the inverse quantization.III. P ROBLEM F ORMULATION

Scalar quantization has been widely used in video codingstandards owing to its computational simplicity, as it generallyemploys one quantizer associated with a speciﬁc quantizationstep. TCQ was studied early in 1990 [31], which can beregarded as large-dimension vector quantization with con-strained vector components and is capable of remedying theinevitable performance loss incurred by scalar quantization tosome extent. The TCQ in VVC is implemented as dependentscalar quantization that simultaneously maintains two quan-tizers Q , Q with four transition states. To be more speciﬁc,TCQ embeds quantization candidates in one block into trellisgraph wherein the best quantization outcomes correspond tothe path with the minimum RD cost. In this manner, the inter-dependencies of transform coefﬁcients can be well exploited,and moreover, TCQ persuades the reconstruction vector to bemore compact with augmented quantizers and candidates. Assuch, signiﬁcantly better RD performance can be achieved.More speciﬁcally, in TCQ of VVC, given a transformcoefﬁcient C ( i ) s , several quantization candidates can be ob-tained based on the pre-quantization results. Subsequently, thequantization level l ( i ) is further converted into the quantizationindex ˜ l ( i ) . Since the representation of quantization index isnearly half of the original quantization level, coding bits couldbe naturally saved. The parity of the current quantizationindex, as well as the current state, determine the state transitionroute and the quantizer for next coefﬁcient, as illustrated in Fig. 3. Consequently, the reconstruction process of Q isalways associated to even times of quantization step Q step , and Q is bounded with odd times of Q step . The reconstructionprocess is illustrated in Algorithm 1, where N m denotes thenumber of quantization indices corresponding to the trellisstages within one coding block and i represents the processingorder.The quantization distortion and rate of individual coefﬁcientare calculated and recorded for each trellis node following thescanning order. Given the current quantization index ˜ l ( i ) andtransition state St ( i ) , the quantization level can be reconstructedas follows, l ( i ) = · ˜ l ( i ) − ( St ( i ) >> ) · si g n ( ˜ l ( i ) ) . (28)where St ( i ) >> typically speciﬁes the utilized quantizer andin turn introduces even or odd multiples of quantization steps.As such, the distortion is given by [15], D (cid:16) ˜ l ( i ) , St ( i ) (cid:17) = (cid:104) C ( i ) − Q step · ( · ˜ l ( i ) − ( St ( i ) >> ) · si g n ( ˜ l ( i ) )) (cid:105) . (29)The absolute of quantization index | ˜ l ( i ) | is entropy codedby signaling the syntax si g , g t , par and g t x with regularmode. Also, the remaining levels denoted by rem are binarizedwith Golomb-Rice code and coded in bypass mode. Despiteof the aforementioned four states that are involved in the statetransition loop, a special state termed as “uncoded” state isintroduced, which attempts to truncate the residuals locatingin the high frequency domain, in an effort to further save thecoding bits. An exempliﬁed trellis graph of one coding blockis illustrated in Fig. 4. The switching from “uncoded” state toState 0 or State 2 is only allowed when encountering non-zeroquantization indices.Following the scanning order, the cost of each individualstage is accumulated along the transition path until attainingthe end of the block. Typically, there are multiple enter-paths with different accumulative costs attaining to the samenode. Only one path with the lowest cost is retained as thesurvivor path. Considering the reverse scanning order, the costaccumulating can be described as follows, J ( i ) = J ( i + ) + D ( ˜ l ( i ) , St ( i ) ) + λ · R ( ˜ l ( i ) ) . (30)In particular, if the state transition is from one “uncoded” stateto another “uncoded” state, the RD cost is iterated as follows, J ( i ) = J ( i + ) + D ( , ) . (31)By contrast, if the state switches from “uncoded” state to State0 or State 2, J ( i ) can be calculated as follows, J ( i ) = J ( i + ) + D ( ˜ l ( i ) , St ( i ) ) + λ · [ R ( ˜ l ( i ) ) + R cb f , + R LP ( x ( i ) , y ( i ) )] , (32)where R cb f denotes the bits used for representing the variationof the coded block ﬂag ( cb f , from 0 to 1). R LP denotes thebits regarding to the position of the last (ﬁrst traversed) non-zero coefﬁcient, the coordinators of which are represented with x ( i ) and y ( i ) . Algorithm 1

Reconstruction of transform coefﬁcients withtrellis coded quantization [4]

Input: Q step , ˜ l (∗) ; Output:

Dequantization results: ˆ C (∗) Initialize state: St ( N m ) ← ; for i = N m ; i > = ; i − − do ˆ C ( i ) = ( · ˜ l ( i ) − ( St ( i ) >> ) · si g n ( ˜ l ( i ) )) · Q step ; St ( i − ) = ( >> (( St ( i ) << ) + (( ˜ l ( i ) &1 ) << ))) &3; end for Such cost accumulation, path comparison and the optimalbranch selection process can be regarded as the add-compare-select (ACS) [9]. Moreover, RD cost calculation is conductedin branch metric unit (BMU). Essentially, the goal of TCQis to ﬁnd the optimal quantization solution that can achievethe minimum RD cost for the whole coding block. Viterbialgorithm is used for the optimal path searching. Multipleroutes are available at each stage, where each route representsthe state transition invited by the quantization index ˜ l ( i ) andindividual transform coefﬁcient can be regarded as the stageof trellis graph. After attempting all the paths linked with onenode, only one path with the lowest accumulated RD cost thatconnects to the destination node will be retained.The complexity of TCQ is mainly attributed to three factors.The ﬁrst one is the total number of trellis stages N m , whichgenerally corresponds to the number of coefﬁcients within oneblock. The second one is the number of branches N ( i ) b linkedto a node, which attributes to the quantity of quantizationcandidates. The third one is the number of states at each stage N ( i ) state . Since the quantization indices are grouped based onthe parity, the number of branches linked to an individualnode is halved compared to the full connected trellis, andcandidate “0” should be additionally counted. Supposing N ( i ) l quantization candidates are available for a single transformcoefﬁcient, the branch complexity N ( i ) b can be described as, N ( i ) b = N ( i ) l / + . (33)For a typical coding block with N m coefﬁcients, the totalbranch count of TCQ can be formulated as, N TCQ = N m (cid:213) i = N ( i ) b × N ( i ) state . (34)Consequently, the complexity is proportional to the stagenumber N m and branch number N ( i ) b . We demonstrate thetheoretical computational complexity in Table I. To betterhandle those three problems while maintaining the efﬁciencyof TCQ, we apply the established rate and distortion modelsto achieve low complexity TCQ for VVC.IV. L OW C OMPLEXITY T RELLIS -C ODED Q UANTIZATION

A. Determination of the Trellis Departure Point

We ﬁrst propose to softly decide the trellis departure pointbased on the rate and distortion models, with the goal ofshrinking the total number of trellis stages N m . Quantization TABLE IC

OMPLEXITY ANALYSES OF

TCQ

Module Branch BMU ACSDistortion Rate Add Compare SelectTCQ N TCQ (cid:205)

Nmi = N ( i ) l (cid:205) Nmi = N ( i ) l N TCQ (cid:205)

Nmi = ( N ( i ) b − ) · N ( i ) state (cid:205) Nmi = ( N ( i ) b − ) · N ( i ) state Fig. 3. Illustration of the state transition [4]. and residual coefﬁcient coding are conducted based on co-efﬁcient group (CG) with reverse scanning order. As such,coefﬁcients from the bottom right to the left-top within onecoding block are orderly mapped into trellis along withaccessible states. The starting point, which serves as theﬁrst non-zero point during traversing of the trellis graph,plays critical roles in TCQ. It is widely acknowledged thatthe coefﬁcients locating within the high frequency regionstend to be quantized to zeros in a soft way in the senseof rate-distortion optimization. To manifest this, statisticalexperiments are conducted to exploit the distribution of the lastnon-zero coefﬁcient position with the HDQ and TCQ in × coding blocks, as illustrated in Fig. 5. It can be observed thatthe positions of the last non-zero coefﬁcient are concentratedon smaller scan indices with TCQ compared to HDQ. Withthe increase of QP, such distribution differences become moreapparent.As such, by combining the rate-distortion models and TCQ,we propose an algorithm that allows us to accurately determinethe trellis departure point in an elegant and low cost way.More speciﬁcally, we cast this problem into the comparisonsof the RD cost, as the RD cost differences associated with twocontiguous non-zero quantized coefﬁcients can be measuredand compared to determine the optimal trellis starting point.With the proposed algorithm, the initial point of the trellisgraph can be postponed, leading to the shrinkage of the totalstages as well as computational complexity in determining theoptimal quantized coefﬁcients of the given coding block.Supposing i and j are two typical positions which satisfythe following constraints, i , j ∈ [ , W × H − ] , i > j , l ( i ) (cid:44) , l ( j ) (cid:44) . l ( k ) = , if k ∈ [ j + , i − ] . (35)Here, we use C (∗) and l (∗) to represent the absolute value of C (∗) s and l (∗) s , respectively. Since coefﬁcient i is ahead of j during traversing, the quantization result l ( i ) should be zero ifposition j serves as the initial point of trellis.The rate estimation model in Eqn. (26) is employed wherethe total rates of a block can be represented with R ( , j ) and R ( , i ) when position j and i serve as the trellis initial point,respectively. As such, the rate difference is formulated asfollows, ∆ R ( j , i ) = R ( , j ) − R ( , i ) = −[ α + β · ˜ l ( i ) + γ · ( R ( i ) LP − R ( j ) LP )] . (36)where ˜ l ( i ) denotes the quantization index of l ( i ) .Regarding the distortion, we further formulate ∆ D ( j , i ) with D ( , j ) and D ( , i ) , where D ( , j ) denotes the overall distortionwhen the coefﬁcient at position j is regarded as the trellisstarting point. The derivations of D ( , j ) and D ( , i ) are givenby, D ( , j ) = j (cid:213) k = [ Q − ( l ( k ) ) − C ( k ) ] + W × H − (cid:213) k = j + ( C ( k ) ) . (37) D ( , i ) = j (cid:213) k = [ Q − ( l ( k ) ) − C ( k ) ] + i − (cid:213) k = j + ( C ( k ) ) + [ Q − ( l ( i ) ) − C ( i ) ] + W × H − (cid:213) k = i + ( C ( k ) ) . (38)As such, ∆ D ( j , i ) can be expressed as, ∆ D ( j , i ) = D ( , j ) − D ( , i ) = −[( Q step · l ( i ) ) − · Q step · l ( i ) · C ( i ) ] . (39)The RD cost difference ∆ J ( j , i ) that characterizes the varia-tion of RD cost by removing i out of the trellis can be derivedas follows, ∆ J ( j , i ) = ∆ D ( j , i ) + λ · ∆ R ( j , i ) . (40)The non-zero coefﬁcient at position i can be eliminated fromthe trellis according to ∆ J ( j , i ) . In other words, if ∆ J ( j , i ) ≤ ,it is unnecessary to involve the coefﬁcient at position i in thetrellis.As such, by combining Eqn. (36), Eqn. (39) and Eqn. (40),an RD-based threshold with respect to C ( i ) can be derived asfollows, C ( i ) ≤ T , (41)where T = · (cid:32) Q step · l ( i ) + λ · ( α + β · ˜ l ( i ) + γ · ( R ( i ) LP − R ( j ) LP )) Q step · l ( i ) (cid:33) . (42) Fig. 4. Illustration of trellis graph [15].(a) (b) (c) (d)Fig. 5. Illustration of the position distribution of the last non-zero coefﬁcient index in 16 ×

16 coding blocks under AI conﬁguration of VVC. Both HDQ andTCQ are considered, where HDQ corresponds to orange bars and TCQ is represented with blue bars.

Assuming that the rate for representing the last position i islarger than or equal to that of position j , by approximating ˜ l ( i ) with . l ( i ) , R ( i ) LP with R ( j ) LP , and substituting λ with φ · Q step , T can be simpliﬁed as follows, T ≈ · Q step · (cid:18) l ( i ) + φ · α l ( i ) + φ · β (cid:19) , (43)where φ is a multiplication factor that can be obtained accord-ing to the VVC conﬁguration. The theoretical minimum valueof T can be obtained as follows, T ≥ Q step · K , (44)where K = (cid:112) φ · α + φ · β. (45)Herein, the minimum value of T is adopted as the threshold.If the condition of Eqn. (41) is satisﬁed, the associatedquantization coefﬁcient can be directly determined as zeroand further removed from the trellis graph. In this way, thetrellis starting point can be postponed to the next non-zerocoefﬁcient. Otherwise, C ( i ) is regarded as the starting point ofthe trellis graph. B. Trellis Pruning

In TCQ, RD cost calculations and examinations are per-formed with trellis branches in an effort to detect the opti-mal quantization results, which introduces high computational

Fig. 6. Illustration of the trellis pruning for the larger quantization candidates. cost to the encoder. We propose to perform trellis pruningtargeting at eliminating the unlikely quantization candidatesand removing the associated transition routes. In this way,the operation complexity in BMU and ACS modules can belowered and the total branch number N TCQ can be decreased.More speciﬁcally, the trellis pruning is carried out based onthe analyses of RD cost relationships with the proposed RDmodel. The quantization candidate set L is composed with ﬁvecandidates in principle, which can be regarded as adjusting thelevel of l ( i ) with different offsets ∆ l . Again, l ( i ) is the absolutevalue of the scalar quantization at position i , and the possiblevalues of ∆ l are as follows, ∆ l ∈ {− l ( i ) , − , − , , + } , if l ( i ) > , TABLE IIP

ERFORMANCE OF THE PROPOSED SCHEME BY POSTPONING THE TRELLIS DEPARTURE POINT UNDER AI AND RA CONFIGURATIONS

Class Sequence AI RABD-Rate(Y) BD-Rate(U) BD-Rate(V)

T S Q T S

Enc

BD-Rate(Y) BD-Rate(U) BD-Rate(V)

T S Q T S

Enc

A1 Tango2 0.07% 0.23% 0.32% 34% 13% 0.01% 0.20% 0.26% 32% 3%FoodMarket4 0.09% 0.22% 0.16% 26% 10% 0.04% -0.14% -0.08% 19% 0%Campﬁre 0.05% 0.09% 0.02% 24% 10% 0.01% 0.14% 0.10% 23% 5%A2 CatRobot1 0.11% 0.18% 0.14% 28% 11% -0.02% 0.01% 0.13% 32% 2%DaylightRoad2 0.16% 0.41% 0.20% 26% 11% 0.07% -0.14% -0.09% 33% 1%ParkRunning3 0.03% 0.13% 0.17% 17% 7% 0.01% 0.12% 0.16% 21% 1%B MarketPlace 0.03% 0.15% 0.38% 22% 10% 0.04% 0.23% 0.41% 26% 5%RitualDance 0.06% 0.28% 0.15% 21% 9% 0.04% -0.22% -0.18% 22% 5%Cactus 0.07% 0.09% 0.21% 22% 10% 0.03% 0.10% 0.29% 27% 6%BasketballDrive 0.10% 0.10% 0.16% 27% 12% 0.05% -0.46% -0.01% 27% 5%BQTerrace 0.06% 0.10% 0.23% 17% 8% 0.07% 0.79% 0.16% 23% 4%C BasketballDrill 0.14% 0.05% 0.14% 23% 11% 0.05% -0.05% 0.07% 27% 7%BQMall 0.08% 0.10% 0.15% 20% 10% 0.05% -0.35% -0.25% 23% 5%PartyScene 0.09% -0.03% 0.15% 13% 6% 0.08% 0.10% 0.30% 19% 6%RaceHorses 0.08% 0.05% 0.00% 17% 8% 0.03% 0.19% 0.21% 21% 5%D BasketballPass 0.07% 0.09% 0.26% 18% 8% 0.08% -0.30% -0.07% 19% 4%BQSquare 0.10% -0.02% -0.20% 12% 5% 0.00% -0.53% -1.09% 17% 3%BlowingBubbles 0.11% -0.22% 0.37% 13% 7% 0.17% 0.69% 0.41% 20% 4%RaceHorses 0.14% -0.25% -0.67% 15% 6% -0.05% -0.28% 0.55% 19% 3%E FourPeople 0.11% 0.17% 0.20% 19% 9% - - - - -Johnny 0.09% 0.40% 0.11% 21% 9% - - - - -KristenAndSara 0.08% 0.05% 0.13% 20% 8% - - - - -F BasketballDrillText 0.11% 0.07% 0.07% 19% 8% 0.05% 0.09% 0.10% 25% 6%ArenaOfValor 0.11% 0.06% 0.18% 17% 4% 0.10% 0.09% 0.18% 22% 6%SlideEditing 0.01% 0.09% 0.21% 7% 4% -0.03% 0.07% -0.02% 10% 2%SlideShow 0.09% 0.50% 0.35% 12% 4% 0.18% -0.19% -0.12% 12% 2%Class A1 0.07% 0.18% 0.16% 28% 11% 0.02% 0.06% 0.09% 25% 3%Class A2 0.10% 0.24% 0.17% 23% 10% 0.02% 0.00% 0.07% 29% 1%Class B 0.07% 0.14% 0.23% 22% 10% 0.04% 0.09% 0.13% 25% 5%Class C 0.10% 0.04% 0.11% 18% 9% 0.05% -0.03% 0.08% 23% 5%Class E 0.09% 0.21% 0.15% 20% 9% - - - - -

Overall 0.09% 0.15% 0.17% 22% 10% 0.03% 0.03% 0.08% 25% 4%

Class D 0.10% -0.10% -0.06% 14% 7% 0.05% -0.11% -0.05% 19% 3%Class F 0.08% 0.18% 0.20% 14% 5% 0.07% 0.02% 0.04% 17% 4%

Fig. 7. Illustration of the trellis pruning for the smaller quantization candi-dates. ∆ l ∈ {− , − , , + , + } , if l ( i ) = , ∆ l ∈ {− , , + , + , + } , if l ( i ) = , ∆ l ∈ { , + , + , + , + } , if l ( i ) = . (46)We use l ( i ) ∆ l to represent the explicit quantization candidateassociated to offset ∆ l as follows, l ( i ) ∆ l = l ( i ) + ∆ l . (47)Supposing C ( i ) is the absolute value of the transform co-efﬁcient at position i , the total distortions of a coding block when quantizing C ( i ) to l ( i ) can be described as, D s = (cid:213) k (cid:44) i D ( k ) + (cid:16) l ( i ) · Q step − C ( i ) (cid:17) . (48)Analogously, if the quantization result of C ( i ) is l ( i ) ∆ l , thedistortions can be expressed as follows, D ∆ l = (cid:213) k (cid:44) i D ( k ) + (cid:16) l ( i ) ∆ l · Q step − C ( i ) (cid:17) . (49)The distortion difference between l ( i ) and l ( i ) ∆ l with respectiveto ∆ l can be formulated as, ∆ D = D s − D ∆ l = − Q step · ∆ l + · Q step · ( C ( i ) − l ( i ) · Q step ) · ∆ l . (50)According to Eqn. (50), ∆ D reaches the maximum value when ∆ l equals to 0. Therefore, it can be noticed that l ( i ) alwaysprovides the lowest quantization distortion.Meanwhile, the rate differences can be estimated with ourproposed model in Eqn. (26) when l ( i ) is changed to l ( i ) ∆ l as TABLE IIIP

ERFORMANCE OF THE PROPOSED TRELLIS PRUNING METHOD UNDER AI AND RA CONFIGURATIONS

Class Sequence AI RABD-Rate(Y) BD-Rate(U) BD-Rate(V)

T S Q T S

Enc

BD-Rate(Y) BD-Rate(U) BD-Rate(V)

T S Q T S

Enc

A1 Tango2 0.04% 0.03% 0.06% 3% 1% 0.01% -0.05% 0.27% 5% 4%FoodMarket4 0.06% 0.03% 0.11% 2% 1% 0.00% -0.20% 0.09% 3% 4%Campﬁre 0.01% 0.05% 0.01% 3% 1% 0.00% 0.05% -0.09% 4% 1%A2 CatRobot1 0.04% 0.13% -0.06% 4% 1% -0.01% -0.12% 0.00% 4% 3%DaylightRoad2 0.02% 0.21% 0.08% 4% 2% 0.01% 0.01% -0.04% 7% 3%ParkRunning3 0.05% 0.05% 0.02% 3% 1% 0.01% 0.04% 0.02% 3% 7%B MarketPlace 0.03% 0.07% 0.19% 2% 1% -0.01% 0.00% -0.22% 2% 2%RitualDance 0.02% 0.30% -0.01% 3% 2% 0.01% 0.03% 0.00% 2% 2%Cactus 0.03% -0.01% 0.11% 3% 1% 0.03% -0.23% -0.03% 3% 0%BasketballDrive 0.04% 0.09% -0.07% 4% 1% 0.01% -0.48% -0.15% 3% 0%BQTerrace 0.04% -0.15% 0.12% 2% 1% 0.05% 0.57% -0.53% 3% 0%C BasketballDrill 0.03% -0.06% 0.09% 0% 0% 0.04% 0.07% -0.17% 2% 0%BQMall 0.01% -0.04% -0.19% 3% 0% 0.03% -0.13% -0.26% 1% -1%PartyScene 0.03% -0.01% -0.03% 0% 2% 0.02% 0.15% -0.28% 0% -1%RaceHorses 0.06% -0.09% 0.02% 2% 0% 0.02% -0.21% 0.00% -1% -4%D BasketballPass 0.03% 0.04% 0.10% 1% 1% -0.06% -1.01% 0.02% 0% 0%BQSquare 0.06% -0.21% -0.10% 1% 0% -0.05% -0.87% -1.24% 0% 0%BlowingBubbles 0.06% -0.11% -0.17% 0% 1% 0.09% 0.18% 0.13% 0% 0%RaceHorses 0.04% 0.02% -0.44% 1% 1% -0.02% -0.79% -0.06% -1% 0%E FourPeople 0.01% -0.02% 0.08% 3% 1% - - - - -Johnny 0.03% 0.25% -0.11% 2% 0% - - - - -KristenAndSara 0.01% -0.06% -0.01% 4% 1% - - - - -F BasketballDrillText 0.04% -0.01% -0.01% 2% 1% 0.03% -0.14% 0.18% 2% -1%ArenaOfValor 0.02% -0.02% 0.06% 2% 1% -0.01% 0.02% -0.06% -2% 0%SlideEditing 0.04% 0.04% 0.10% 1% 1% -0.07% -0.01% -0.11% 2% 1%SlideShow 0.11% -0.07% -0.22% 1% 0% 0.00% -0.37% -0.39% 2% 0%Class A1 0.03% 0.04% 0.06% 3% 1% 0.01% -0.07% 0.09% 4% 3%Class A2 0.03% 0.13% 0.01% 3% 2% 0.00% -0.02% -0.01% 5% 4%Class B 0.03% 0.06% 0.07% 3% 1% 0.02% -0.02% -0.18% 3% 1%Class C 0.03% -0.05% -0.03% 1% 1% 0.03% -0.03% -0.18% 0% -2%Class E 0.01% 0.06% -0.01% 3% 1% - - - - -

Overall 0.03% 0.04% 0.02% 3% 1% 0.01% -0.03% -0.09% 3% 1%

Class D 0.05% -0.06% -0.15% 1% 1% -0.01% -0.62% -0.29% 0% 0%Class F 0.05% -0.02% -0.02% 1% 1% -0.01% -0.13% -0.10% 1% 0% follows, ∆ R = R s − R ∆ l = ( α · L + β · L + γ · R LP + (cid:15) )− (cid:2) α · ( L + η ) + β · ( L + ˜ ∆ l ) + γ · R LP + (cid:15) (cid:3) = − α · η − β · ˜ ∆ l , (51)where ˜ ∆ l is the difference of the coded index when thequantization level is adjusted by ∆ l , and can be calculatedas follows, ˜ ∆ l = ˜ l ∆ l ( i ) − ˜ l ( i ) . (52)Herein, ˜ l ∆ l ( i ) and ˜ l ( i ) represent the coded indices of quantiza-tion candidates l ( i ) ∆ l and l ( i ) , respectively. Typically, η denotesthe variations regarding the number of non-zero coefﬁcients,which can be determined as follows, η =  l ( i ) (cid:44) and ˜ l ∆ l ( i ) (cid:44)

01 ˜ l ( i ) = and ˜ l ∆ l ( i ) (cid:44) − l ( i ) (cid:44) and ˜ l ∆ l ( i ) = . (53)Subsequently, we discuss the variations of the rate anddistortion with the following cases.

1) l ( i ) = : in this case, ∆ l and ˜ ∆ l are both non-negative,such that η equals to 1. Since the parameters α and β arepositive, Eqn. (51) can be written as follows, ∆ R = − α − β · ˜ ∆ l ≤ . (54)Both the distortion and rate may increase if l ( i ) is adjustedto ˜ l ∆ l ( i ) in such scenario, which implies that the remainingquantization candidates could possibly introduce higher RDcost, leading to the coding performance loss. As such, it iseligible to directly remove the quantization candidates that arewith higher levels without further calculation of the RD cost.

2) l ( i ) = or : the corresponding quantization index ˜ l ( i ) is “1”. The explicit value of ∆ R depends on ˜ ∆ l , which isformulated as follows, ∆ R = (cid:40) α + β ˜ ∆ l = − − β · ˜ ∆ l ˜ ∆ l = or . (55)It can be observed that when ˜ ∆ l equals to -1, ∆ R is a positiveconstant, indicating the savings of coding bits. However, it isdifﬁcult to intuitively predict the ﬁnal variations of RD cost,since the associated distortion is also increased. Moreover,it can be inferred that positive ˜ ∆ l leads to the increase ofthe coding bits. Therefore, larger quantization candidates are considered to be removed from the trellis graph in suchscenarios.

3) l ( i ) > : Though negative ˜ ∆ l results in the saving of thecoding bits, it is still difﬁcult to speculate the actual variationof RD cost, ∆ R = (cid:40) α + β · ˜ l ( i ) ˜ ∆ l = − ˜ l ( i ) − β · ˜ ∆ l ˜ ∆ l = − , , or . (56)For the case that ˜ ∆ l equals to − ˜ l ( i ) , remarkable increase ofdistortions could be noticed, which cannot be well remedied bythe saving of coding bits. As such, it is proposed to eliminatethe checking of candidate level 0.We provide two exempliﬁed quantization candidate sets tobetter illustrate the technical details of pruning. The ﬁrst set L = { [ ] , [ ] , [ ] , [ ] , [ ]} conforming to the formertwo cases, where the numbers inside and outside of thesquare brackets denote the quantization indices and quanti-zation levels, respectively. The proposed pruning procedure isdemonstrated in Fig. 6. Initially, following the transition rule ofTCQ, candidates “ [ ] ” and “ [ ] ” are coupled and boundedto the quantizer Q , where State 0 and State 1 are assigned astransmitting states. Similarly, “ [ ] ” and “ [ ] ” are paired withquantizer Q associating to State 2 and State 3. The candidate“ [ ] ” is speciﬁcally bounded with each state. AccumulatedRD cost will be calculated for each node with the considerationof all available branches, and the one with minimal cost willbe retained in the trellis graph. With the proposed method, theunlikely-selected branches with larger quantization levels suchas “ [ ] ” and “ [ ] ” are directly pruned without RD checking,leading to lower computational complexity. The second set L = { [ ] , [ ] , [ ] , [ ] , [ ]} corresponds to the last casewhere transition routes of “ [ ] ” can be pruned, as illustratedin Fig. 7. Since the proposed pruning method does not haveany inﬂuence on the state transition, the dequantization processremains consistent. In practice, we map the relationship of l ( i ) and , and to the C ( i ) and associated thresholds, in order toavoid the calculation of l ( i ) . In this way, given C ( i ) the pruningstrategy can be efﬁciently determined.V. E XPERIMENTAL R ESULTS

A. Performance Evaluations

The proposed low complexity TCQ approaches are imple-mented on the VVC test platform VTM-4.0 [32]. Simulationsare conducted conforming to the JVET Common Test Condi-tions (CTC) [33] where the recommended test sequences fromclass A to class F are all involved in the experiment under AIand RA conﬁgurations. The QP values are set as {22, 27, 32,37}, and BD-Rates [34] for Y, U and V components are used toevaluate the coding performance where negative value denotesthe performance gain. Computational complexity reductionis measured with the total encoding time-saving

T S

Enc andquantization time-saving

T S Q as follows, T S

Enc = T AncEnc − T ProEnc T AncEnc × , T S Q = T AncQ − T ProQ T AncQ × , (57) where T ProEnc and T AncEnc denote the total elapsed encoding timewith the proposed low complexity scheme and the originalanchor, respectively. Analogously, T ProQ and T AncQ stand forthe quantization time of the proposed scheme and anchor,respectively.Experimental results of the proposed scheme by postponingthe trellis initial point are demonstrated in Table II wherethe time-saving and the coding performance regarding eachindividual sequence are presented. To be more speciﬁc, theproposed method brings 22% quantization time savings and10% encoding time savings under AI conﬁguration. The per-formance loss is quite marginal, with 0.09%, 0.15% and 0.17%BD-Rate increases for Y, U and V components, respectively.Similar performance can be observed under RA conﬁgurationwhere the time-saving in quantization is 25% and the BD-Rateloss is only 0.03%, which strikes an excellent compromisebetween the coding performance and computational complex-ity. Table III shows the performance of the proposed trellispruning method, with which 3% quantization time reductionsand 1% encoding time reductions can be achieved. Sincethe branch pruning is guided by the RD model, there isonly 0.03% and 0.01% loss on average under AI and RAconﬁgurations. Moreover, it is also interesting to see thatperformance gains may even happen on chroma componentsunder the RA conﬁguration. In addition, we combine the trellispruning method with the initial point postponing method,and the experimental results are provided in Table IV. Bycombining those two approaches, on average 24% and 27%quantization time savings can be achieved under AI and RAconﬁgurations with only 0.11% and 0.05% BD-Rate loss.Moreover, the overall encoding time savings are 11% and 5%under AI and RA conﬁgurations, respectively.

B. Investigations and Discussions

In this subsection, we provide more detailed analysesregarding the proposed schemes. First, we investigate theparameter K in Eqn. (45) to further evaluate the robustnessof the proposed method by postponing the trellis initial point.As indicated in Eqn. (45), different settings of K result invarying scaling factors regarding the threshold for determiningthe trellis initial point. Constant settings of K are attempted tofurther verify the effectiveness of postponing the trellis initialpoint. More speciﬁcally, K is ﬁrst set to a safe value as . Thecorresponding experimental results are tabulated in Table Vwhere the time consumed by TCQ can be saved by 16% and22% under AI and RA conﬁgurations, and the correspondingencoding time decreases are 6% and 3% respectively. More-over, the coding performance loss is negligible, especiallyunder the RA conﬁguration. As such, the safe setting of K has been adopted as an encoder optimization method intoVVC software [35]. Furthermore, we set K with a risky valueas . and provide the coding performance in Table VI. Itcan be observed that on average 30% quantization time isreduced under AI conﬁguration and the encoding time-savingis 14%. With the RA conﬁguration, the risky K achieves 37%quantization time and 7% encoding time savings. Meanwhile,0.24% and 0.12% BD-Rate losses are introduced by the risky TABLE IVP

ERFORMANCE OF THE COMBINATION OF THE PROPOSED POSTPONING THE TRELLIS INITIAL POINT AND TRELLIS PRUNING METHOD UNDER AI AND RA CONFIGURATIONS

Class Sequence AI RABD-Rate(Y) BD-Rate(U) BD-Rate(V)

T S Q T S

Enc

BD-Rate(Y) BD-Rate(U) BD-Rate(V)

T S Q T S

Enc

A1 Tango2 0.08% 0.25% 0.38% 35% 15% 0.03% -0.08% 0.11% 36% 5%FoodMarket4 0.13% 0.30% 0.20% 28% 11% 0.09% 0.13% 0.10% 30% 2%Campﬁre 0.07% 0.13% 0.12% 27% 12% 0.02% 0.17% 0.06% 27% 5%A2 CatRobot1 0.11% 0.26% 0.26% 29% 13% -0.01% -0.03% 0.10% 36% 9%DaylightRoad2 0.17% 0.41% 0.25% 28% 13% 0.03% 0.14% 0.08% 37% 9%ParkRunning3 0.08% 0.17% 0.23% 19% 8% 0.04% 0.13% 0.14% 26% 5%B MarketPlace 0.08% 0.20% 0.42% 25% 11% 0.03% -0.04% 0.18% 28% 6%RitualDance 0.08% 0.23% 0.35% 22% 10% 0.07% -0.13% 0.16% 24% 4%Cactus 0.10% 0.13% 0.30% 25% 12% 0.06% 0.02% -0.02% 26% 6%BasketballDrive 0.09% 0.18% 0.26% 27% 13% 0.07% -0.11% 0.00% 26% 3%BQTerrace 0.08% 0.12% 0.40% 20% 10% 0.05% 0.75% -0.12% 24% 3%C BasketballDrill 0.15% 0.04% 0.13% 22% 11% 0.08% -0.04% -0.15% 27% 4%BQMall 0.10% 0.15% 0.10% 23% 10% 0.05% -0.04% -0.08% 23% 2%PartyScene 0.12% -0.04% 0.30% 14% 8% 0.03% 0.07% 0.27% 20% 3%RaceHorses 0.12% 0.02% 0.15% 17% 9% 0.07% 0.21% -0.32% 22% 3%D BasketballPass 0.09% 0.51% 0.16% 18% 8% 0.09% -0.10% -0.24% 19% 5%BQSquare 0.16% -0.05% -0.35% 12% 6% 0.03% -0.18% -1.44% 17% 4%BlowingBubbles 0.15% 0.16% 0.12% 13% 7% 0.14% 0.29% 0.39% 20% 6%RaceHorses 0.12% 0.05% -0.33% 15% 6% 0.01% -0.33% 0.31% 17% 5%E FourPeople 0.12% 0.03% 0.31% 20% 10% - - - - -Johnny 0.12% 0.21% 0.07% 22% 10% - - - - -KristenAndSara 0.10% 0.00% 0.16% 21% 9% - - - - -F BasketballDrillText 0.15% 0.00% 0.05% 21% 8% 0.10% 0.26% 0.15% 25% 7%ArenaOfValor 0.12% 0.13% 0.11% 20% 6% 0.08% 0.21% 0.15% 24% 7%SlideEditing 0.09% 0.12% 0.22% 8% 4% 0.01% -0.04% 0.00% 9% 2%SlideShow 0.14% 0.16% -0.08% 12% 4% 0.16% -0.30% -0.38% 10% 3%Class A1 0.10% 0.23% 0.23% 30% 13% 0.05% 0.07% 0.09% 31% 4%Class A2 0.12% 0.28% 0.25% 26% 12% 0.02% 0.08% 0.11% 33% 8%Class B 0.09% 0.17% 0.35% 24% 11% 0.06% 0.10% 0.04% 26% 4%Class C 0.12% 0.05% 0.17% 19% 10% 0.06% 0.05% -0.07% 23% 3%Class E 0.11% 0.08% 0.18% 21% 10% - - - - -

Overall 0.11% 0.16% 0.24% 24% 11% 0.05% 0.08% 0.03% 27% 5%

Class D 0.13% 0.17% -0.10% 15% 7% 0.07% -0.08% -0.25% 18% 5%Class F 0.13% 0.10% 0.07% 15% 6% 0.09% 0.03% -0.02% 17% 5%

TABLE VP

ERFORMANCE OF THE PROPOSED SCHEME BY POSTPONING THE TRELLIS DEPARTURE POINT WITH K = ( SAFE ) UNDER AI AND RA CONFIGURATIONS

Class AI RABD-Rate(Y) BD-Rate(U) BD-Rate(V)

T S Q T S

Enc

BD-Rate(Y) BD-Rate(U) BD-Rate(V)

T S Q T S

Enc

A1 0.02% 0.18% 0.07% 21% 8% 0.01% 0.01% 0.01% 24% 3%A2 0.01% 0.09% 0.01% 19% 8% -0.02% 0.08% 0.01% 26% 5%B 0.02% 0.00% 0.08% 16% 6% 0.02% 0.08% -0.17% 20% 3%C 0.04% -0.05% -0.04% 13% 4% 0.00% -0.10% 0.09% 19% 2%E 0.03% 0.06% 0.01% 14% 5% - - - - -

Overall 0.02% 0.05% 0.03% 16% 6% 0.01% 0.02% -0.03% 22% 3%

D 0.03% 0.02% -0.05% 10% 3% 0.00% -0.28% -0.19% 14% 1%F 0.05% -0.04% -0.02% 10% 3% 0.01% 0.00% 0.02% 13% 1% K under AI and RA conﬁgurations, respectively. Therefore,different settings of K could bring dynamic trade-off betweenthe complexity reduction and coding performance variation.Subsequently, we study the proposed method from the per-spective of the operation complexity which essentially relieson the quantity of stages and branches, as shown in Eqn. (34).Considering that there are three branches launched from onestate node and each stage contains ﬁve states in the practicalscenario, the total branch number reaches to · N m for ablock with N m middle stages where the start and end pointsare excluded for clarity. The explicit operation complexitiesregarding BMU and ACS modules are concluded in Table VII. Supposing the proposed method by postponing the trellisinitial point can reduce the number of stages from N m to N ∗ m ,by combining with the trellis pruning scheme, the total branchnumber within on coding block can be reduced to · N ∗ m atmost, such that the operations in BMU and ACS modulesare decreased accordingly. Therefore, the proposed methodbrings the overall simpliﬁcation of the operation complexityin TCQ. It is also worth mentioning that quantization playsa critical role in lossy coding scenarios and alleviating thecomputational complexity of quantization is highly desirablefor the implementation of the VVC encoder in the nearfuture. The proposed methods are capable of achieving the TABLE VIP

ERFORMANCE OF THE PROPOSED SCHEME BY POSTPONING THE TRELLIS DEPARTURE POINT WITH K = . ( RISKY ) UNDER AI AND RA CONFIGURATIONS

Class AI RABD-Rate(Y) BD-Rate(U) BD-Rate(V)

T S Q T S

Enc

BD-Rate(Y) BD-Rate(U) BD-Rate(V)

T S Q T S

Enc

A1 0.23% 0.81% 0.79% 36% 15% 0.17% 0.59% 0.49% 39% 7%A2 0.18% 0.63% 0.60% 32% 14% 0.13% 0.53% 0.48% 42% 7%B 0.21% 0.54% 0.83% 30% 14% 0.13% 0.70% 0.79% 36% 8%C 0.27% 0.45% 0.58% 24% 12% 0.16% 0.54% 0.60% 33% 7%E 0.31% 0.60% 0.61% 26% 13% - - - - -

Overall 0.24% 0.59% 0.69% 30% 14% 0.12% 0.50% 0.51% 37% 7%

D 0.28% 0.52% 0.52% 20% 9% 0.12% 0.63% 0.22% 28% 6%F 0.24% 0.41% 0.47% 19% 7% 0.12% 0.41% 0.46% 24% 5%

TABLE VIIC

OMPLEXITY ANALYSES OF THE ORIGINAL

TCQ

AND THE PROPOSED METHOD

Module Branch BMU ACSDistortion Rate Add Compare SelectTCQ · N m · N m · N m · N m · N m · N m Proposed · N ∗ m ∼ · N ∗ m · N ∗ m ∼ · N ∗ m · N ∗ m ∼ · N ∗ m · N ∗ m ∼ · N ∗ m · N ∗ m ∼ · N ∗ m · N ∗ m ∼ · N ∗ m progressive reductions on the quantization complexity withvery marginal performance loss on the VVC based codingplatform, which provides insights regarding the subsequentdevelopment of the commercial encoders.VI. C ONCLUSIONS

This paper proposes a low complexity TCQ scheme forVVC encoding. The novelty of this paper lies in that theprominent and deterministic factors that inﬂuence the cod-ing complexity of TCQ in VVC are identiﬁed, such thatcorresponding low complexity quantization schemes are de-veloped based on theoretically established rate and distortionmodels. Experimental results show that the proposed schemeachieves 11% and 5% encoding time savings, and 24% and27% quantization time savings on average under AI and RAconﬁgurations, respectively. The coding performance loss isvery marginal where 0.11% and 0.05% BD-Rate increasescan be observed under AI and RA conﬁgurations. Furtherinvestigations also show that the proposed method constantlyreduces the operation complexity regarding the quantizationand achieves progressively complexity reductions with mod-erate coding performance loss.R

EFERENCES [1] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra,“Overview of the H.264/AVC video coding standard,”

IEEETransactions on Circuits and Systems for Video Technology ,vol. 13, no. 7, pp. 560–576, July 2003.[2] G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overviewof the high efﬁciency video coding (HEVC) standard,”

IEEETransactions on Circuits and Systems for Video Technology ,vol. 22, no. 12, pp. 1649–1668, Dec 2012.[3] W. Gao and S. Ma, “An overview of AVS2 standard,”

AdvancedVideo Coding Systems , vol. 22, pp. 35–49, Jan. 2014.[4] B. Bross, J. Chen, and S. Liu, “Versatile video coding (draft4),”

JVET-M1001 , 2019.[5] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization forvideo compression,”

IEEE Signal Processing Magazine , vol. 15,no. 6, pp. 74–90, Nov 1998. [6] H. E. III, “Generalized lagrange multiplier method for solvingproblems of optimum allocation of resources,”

Operations Re-search , vol. 11, no. 3, pp. 399–417.[7] Y.-J. Ahn, T.-J. Hwang, D.-G. Sim, and W.-J. Han,“Implementation of fast HEVC encoder based on SIMD anddata-level parallelism,”

EURASIP Journal on Image and VideoProcessing , vol. 2014, no. 1, p. 16, Mar 2014. [Online].Available: https://doi.org/10.1186/1687-5281-2014-16[8] H. Yang, L. Shen, X. Dong, Q. Ding, P. An, and G. Jiang,“Low complexity ctu partition structure decision and fast intramode decision for versatile video coding,”

IEEE Transactionson Circuits and Systems for Video Technology , pp. 1–1, 2019.[9] H. Yin, E. Yang, X. Yu, and Z. Xia, “Fast soft decision quantiza-tion with adaptive preselection and dynamic trellis graph,”

IEEETransactions on Circuits and Systems for Video Technology ,vol. 25, no. 8, pp. 1362–1375, Aug 2015.[10] H. Lee, S. Yang, Y. Park, and B. Jeon, “Fast quantization methodwith simpliﬁed rate-distortion optimized quantization for anHEVC encoder,”

IEEE Transactions on Circuits and Systemsfor Video Technology , vol. 26, no. 1, pp. 107–116, Jan 2016.[11] J. Wen, M. Luttrell, and J. Villasenor, “Trellis-based R-Doptimal quantization in H.263+,”

IEEE Transactions on ImageProcessing , vol. 9, no. 8, pp. 1431–1434, Aug 2000.[12] E. Yang and X. Yu, “Soft decision quantization for H.264 withmain proﬁle compatibility,”

IEEE Transactions on Circuits andSystems for Video Technology , vol. 19, no. 1, pp. 122–127, Jan2009.[13] M. Karczewicz, Y. Ye, and I. Cheong, “Rate distortion opti-mized quantization, document VCEG-AH21,” Jan 2008.[14] G. Sullivan, “Adaptive quantization encoding technique usingan equal expected-value rule, document JVT-N011,”

Joint VideoTeam of ISO/IEC and ITU-T, Hong Kong , 2005.[15] H. Schwarz, T. Nguyen, D. Marpe, and T. Wiegand, “Hybridvideo coding with trellis-coded quantization,” in , March 2019, pp. 182–191.[16] H. Schwarz, T. Nguyen, D. Marpe, T. Wiegand, M. Karczewicz,M. Coban, and J. Dong, “Improved quantization and transformcoefﬁcient coding for the emerging versatile video coding(VVC) standard,” in , Sep. 2019, pp. 1183–1187.[17] T. Huang and H. H. Chen, “Efﬁcient quantization based on rate-distortion optimization for video coding,”

IEEE Transactions onCircuits and Systems for Video Technology , vol. 26, no. 6, pp. IEEE Transactions on Image Process-ing , vol. 26, no. 8, pp. 3802–3816, Aug 2017.[19] R. Reininger and J. Gibson, “Distributions of the two-dimensional DCT coefﬁcients for images,”

IEEE Transactionson Communications , vol. 31, no. 6, pp. 835–839, June 1983.[20] G. S. Yovanof and S. Liu, “Statistical analysis of the DCTcoefﬁcients and their quantization error,” in

Conference Recordof The Thirtieth Asilomar Conference on Signals, Systems andComputers , vol. 1, Nov 1996, pp. 601–605 vol.1.[21] E. Y. Lam and J. W. Goodman, “A mathematical analysis of theDCT coefﬁcient distributions for images,”

IEEE Transactions onImage Processing , vol. 9, no. 10, pp. 1661–1666, Oct 2000.[22] N. Kamaci, Y. Altunbasak, and R. M. Mersereau, “Framebit allocation for the H.264/AVC video coder via cauchy-density-based rate and distortion models,”

IEEE Transactionson Circuits and Systems for Video Technology , vol. 15, no. 8,pp. 994–1006, Aug 2005.[23] X. Zhao, J. Sun, S. Ma, and W. Gao, “Novel statistical modeling,analysis and implementation of rate-distortion estimation forH.264/AVC coders,”

IEEE Transactions on Circuits and Systemsfor Video Technology , vol. 20, no. 5, pp. 647–660, May 2010.[24] X. Li, N. Oertel, A. Hutter, and A. Kaup, “Laplace distributionbased lagrangian rate distortion optimization for hybrid videocoding,”

IEEE Transactions on Circuits and Systems for VideoTechnology , vol. 19, no. 2, pp. 193–205, Feb 2009.[25] E. Yang, X. Yu, J. Meng, and C. Sun, “Transparent compositemodel for DCT coefﬁcients: Design and analysis,”

IEEE Trans-actions on Image Processing , vol. 23, no. 3, pp. 1303–1316,March 2014.[26] Z. He and S. K. Mitra, “A linear source model and a uniﬁed ratecontrol algorithm for DCT video coding,”

IEEE Transactions onCircuits and Systems for Video Technology , vol. 12, no. 11, pp.970–982, Nov 2002.[27] S.-C. Chang, J.-F. Yang, C.-F. Lee, and J.-N. Hwang, “A novelrate predictor based on quantized DCT indices and its ratecontrol mechanism,”

Signal Processing: Image Communication ,vol. 18, no. 6, pp. 427–441, 2003.[28] Y.-K. Tu, J.-F. Yang, and M.-T. Sun, “Efﬁcient rate-distortionestimation for H.264/AVC coders,”

IEEE Transactions on Cir-cuits and Systems for Video Technology , vol. 16, no. 5, pp.600–611, May 2006.[29] Q. Chen and Y. He, “A fast bits estimation method for rate-distortion optimization in H.264/AVC,” in

Proc. Picture CodingSymp. (PCS) , Dec 2004, pp. 133–134.[30] T. M. Cover and J. A. Thomas,

Elements of information theory .John Wiley & Sons, 2012.[31] M. W. Marcellin and T. R. Fischer, “Trellis coded quantizationof memoryless and gauss-markov sources,”

IEEE Transactionson Communications , vol. 38, no. 1, pp. 82–93, Jan 1990.[32] “VVC software VTM-4.0,” https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/tags/VTM-4.0/.[33] F. Bossen, J. Boyce, K. Suehring, X. Li, and V. Seregin, “JVETcommon test conditions and software reference conﬁgurationsfor SDR video,”

Joint Video Exploration Team (JVET), doc.JVET-M1010 , Jan. 2019.[34] G. Bjøntegaard, “Calculation of average PSNR differencesbetween RD-curves,”

ITU-T SG 16 Q.6 VCEG-M33 , 2001.[35] M. Wang, J. Li, L. Zhang, K. Zhang, H. Liu, S. Wang, andS. Ma, “Non-CE: Fast encoder with adjusted threshold independent quantization,”