Relying on a rate constraint to reduce Motion Estimation complexity
Gabriel B. Sant'Anna, Luiz Henrique Cancellier, Ismael Seidel, Mateus Grellert, José Luís Güntzel
CCopyright 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, includingreprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of anycopyrighted component of this work in other works..
RELYING ON A RATE CONSTRAINT TO REDUCE MOTION ESTIMATION COMPLEXITY
Gabriel B. Sant’Anna, Luiz Henrique Cancellier, Ismael Seidel, Mateus Grellert, Jos´e Lu´ıs G¨untzel
Embedded Computing Laboratory (ECL) - Dept. of Computer Science and Statistics (INE)Federal University of Santa Catarina (UFSC) - Florian´opolis, Brazil { baiocchi.gabriel, luizhenriquecancellier, ismaelseidel } @gmail.com, { mateus.grellert, j.guntzel } @ufsc.br ABSTRACT
This paper proposes a rate-based candidate elimination strat-egy for Motion Estimation, which is considered one of themain sources of encoder complexity. We build from findingsof previous works that show that selected motion vectors aregenerally near the predictor to propose a solution that usesthe motion vector bitrate to constrain the candidate search toa subset of the original search window, resulting in less dis-tortion computations. The proposed method is not tied to aparticular search pattern, which makes it applicable to severalME strategies. The technique was tested in the VVC refer-ence software implementation and showed complexity reduc-tions of over 80% at the cost of an average 0.74% increase inBD-Rate with respect to the original TZ Search algorithm inthe LDP configuration.
Index Terms — Video Coding, Integer Motion Estima-tion, Block-Matching Algorithm, Rate-Constraint, VVC
1. INTRODUCTION
Due to a growing consumer demand for video playback ser-vices and high-definition content, video data amounts to themajority of global internet traffic [1] – even more so dur-ing the COVID-19 pandemic, which has caused a worldwideincrease in digital media consumption [2]. As the demandfor transmitted video content expands, the need for advancedcompression techniques increases.The Joint Video Exploration Team of ITU-T developedthe Versatile Video Coding (VVC), a state-of-the-art videocoding standard improving over the previous one, the HighEfficiency Video Coding (HEVC). The newly achieved com-pression ratio is paired with a rise in encoder complexity: theVVC reference software implementation presents an average44.4% bitrate reduction at the cost of an approximately 10 × higher encoding time w.r.t. that of HEVC [3]. This increasedcomplexity prevents some applications – such as real-time This study was financed in part by the Coordenac¸˜ao de Aperfeic¸oamentode Pessoal de N´ıvel Superior - Brasil (CAPES) - Finance Code 001 andby Brazilian Council for Scientific and Technological Development (CNPq)through Project Universal (437924/2018-1) and PQ grant (312077/2018-1). streaming and video coding in battery-constrained devices –from making full use of the new standard, thus motivatingresearch efforts to improve encoding performance.Motion Estimation (ME) has been regarded as one of themost time-consuming operations of video compression, be-ing responsible for 40% to 80% of encoding time when us-ing Fullsearch (FS) [4]. Many different fast ME algorithmsattempt to reduce this cost by limiting the Integer Motion Es-timation (IME) search region to as few points as possible [5],but while this strategy does improve performance, the searchbecomes sub-optimal and may get trapped in a local mini-mum [6]. The Test Zone Search (TZS) algorithm, adoptedas the default IME method in both of the reference softwareimplementations of HEVC [7] and VVC [8], mitigates thisproblem by combining a variant of Diamond Search with thesemi-exhaustive Raster Search [9].Although TZS brings ME complexity down when com-pared to FS, the number of searched points is still consideredtoo large for real-time applications [4] [10]. In addition, theiterative nature of its search steps prevents efficient parallelimplementations [9]. The “Octagonal-Axis Raster” search,proposed by [11], exemplifies how a smaller number of searchpoints can reduce ME complexity with a negligible increasein Bjøntegaard Delta Bitrate (BD-Rate). It was designed toaccount for Motion Vector (MV) distributions averaged overdecoded video sequences in order to limit the search to re-gions with the highest occurrences of the best MVs. Unfortu-nately, this method cannot be generalized for different algo-rithms because it operates in a specific TZS step. Meanwhile,the work in [12] leveraged the assumption that the likelihoodof finding optimal candidates decreases as the MV bitrate in-creases to produce a rate-ordered variant of Successive Elimi-nation [13]. Their algorithm provides the optimal solution (interms of coding efficiency) within the search set, but limitsthe complexity reduction that can be achieved.In light of the observations brought up by [11] and [12],that selected MV distribution and MV bitrate surfaces shouldbe taken into account when designing a ME search pattern,we explore the impact of the MV bitrate estimation on fastIME algorithms. Thus, our contributions are:To appear in
Proc. ICASSP 2021, June 06-11, 2021, Toronto, Canada © 2020 IEEE a r X i v : . [ ee ss . I V ] F e b . We bring evidence that previous IME search patterns arerelated to MV bitrate. To the best of our knowledge, no pre-vious work explicitly explores this relation [4] to define thesearch pattern, but those patterns frequently seem to matchthe estimated bitrate cost surface;2. A strategy to reduce IME complexity by explicitly usingthe bitrate component of the cost function as a criterion toeliminate distortion calculations on a per-candidate basis. Un-like existing fixed search patterns, our technique is flexibleenough to be used in conjunction with other fast algorithmsand also allows for different complexity reduction and qualitytargets by parameterizing the elimination threshold;3. An evaluation and discussion considering Low Delay P(LDP) and Random Access (RA) configurations, combiningour strategy with the TZS implementation of the VVC TestModel (VTM) as a case study.This paper is organised as follows: we highlight key con-cepts of the ME process in Section 2 and then explain theproposed algorithm in Section 3. Section 4 details our exper-iments and displays obtained results, which are followed by abrief discussion and possible future works in Section 5.
2. BACKGROUND
ME has three distinct steps: MV Prediction, Integer ME andFractional ME [4]. MV Prediction uses the motion informa-tion of neighboring blocks to derive a Motion Vector Predic-tor (MVP), which defines the starting position of the subse-quent search. Then, IME searches a region centered aroundthe MVP for a MV which minimizes the cost function. Atlast, Fractional ME performs a refinement process over theinteger result. We focus on the second step, namely IME.Equation (1) shows the cost function minimized duringME [14], where r estimates the bitrate of the difference be-tween a given MV ( (cid:126)mv ) and the MVP ( (cid:126)mvp ), λ is the La-grange multiplier increasing the weight of the bitrate compo-nent and d computes distortion between the “original” pixelblock ( O ) given as input to the ME process and a “candidate”block ( C (cid:126)mv ), which is offset to the original block by the givenMV. j ( (cid:126)mv ) = d ( C (cid:126)mv ) + λ · r ( (cid:126)mv − (cid:126)mvp ) (1)The Sum of Absolute Differences (SAD), shown in (2),is commonly used as distortion metric due to its simplicity.However, its calculation is needed for every evaluated candi-date, requiring large sample blocks to be loaded from mem-ory, making it the most time-consuming computation in IME. d ( C ) = m (cid:88) i =1 n (cid:88) j =1 | C i,j − O i,j | (2)The bitrate estimation function, in turn, is implemented inthe VTM as (3), where g ( v ) denotes the length of a signed Ex-ponential Golomb code for an integer value and each vector Fig. 1 : Bitrate surface for MV coordinates relative to theMVP. Black lines highlight bitrate values of 20 (biggestshape), 10 (cross-shaped contour) and 4 (small diamond).component is given in coordinates relative to the MVP. Fig.1 displays the bitrate surface of a search region of × pixels centered around the MVP. We highlight that r ( (cid:126)mvd ) can be efficiently implemented because it does not depend onthe video’s content and g ( v ) can be calculated with a lookuptable since the search window size limits the range of possiblevalues for (cid:126)mvd x and (cid:126)mvd y [15]. r ( (cid:126)mvd ) = g ( (cid:126)mvd x ) + g ( (cid:126)mvd y ) (3)Fig. 2 shows a flowchart of TZS, which is the defaultIME algorithm in VTM. Its first step chooses a starting pointfor the following search, similarly to MV Prediction but withadded MVs. Around the starting MV, the First Search stepexpands a diamond. When the First Step best candidate isfound at a distance d greater than the raster step size, a RasterSearch is performed to find a closer match. Finally, the Re-finement Step iteratively expands a diamond centered in thecurrent best candidate’s position until it either finds a fixedpoint or makes a maximum number of attempts, in which caseit returns the current best candidate. Fig. 2 : High-level flowchart of the TZS algorithm.
3. RATE-BASED CANDIDATE ELIMINATION
Although the TZS algorithm significantly reduces complexitywhen compared to an exhaustive search, related works show2 ig. 3 : Selected MV distribution averaged over the set oftested sequences (see Section 4). Brighter colors representan exponentially higher number of decisions at a given point.that it is still possible to narrow down the number of evalu-ated candidates with only a small loss in coding efficiency.Notably, previous works regarding ME in HEVC have shownthat selected MVs are generally located around the predictor:it has been stated that 87% of them can be found after the TZSprediction step [16] and that over 94% of the best MVs arewithin a small, diamond-shaped range around the MVP [10].Fig. 3 shows a heatmap with the spatial distribution ofMVs chosen by the VTM encoder. When we create pointsby element-wise pairing the values from Fig. 1 and the logvalues of Fig. 3, a Pearson correlation coefficient of − . can be found, showing that the number of selected MVs ex-ponentially decreases as the bitrate estimate increases. Thisallows us to conclude that most decisions are likely within afraction of the search region where estimated bitrate valuesare smaller.Therefore, our approach consists in reducing the averagecomplexity of the cost function j ( (cid:126)mv ) computation by skip-ping distortion computations for all MVs for which r ( (cid:126)mv − (cid:126)mvp ) > t , where t is a threshold value. Three reasons ledus to using this approach: 1) the function can be efficientlycalculated with lookup tables, so skipped blocks do not needto be fetched from memory; 2) there is a direct correlationbetween effectively selected candidate distributions and thevector rate surface; 3) the criterion can be applied on top ofexisting IME algorithms, and the threshold can be adjusted tosuit a specific application’s constraints. A possible disadvan-tage would be its reliance on the MV prediction step efficacy.When using this technique, the IME search window is ex-pected to be constrained to a diamond-shaped region centered around the MVP, which is extended over its axes and has aradius proportional to the threshold t . For instance, when thiscandidate elimination strategy is used with t = 20 , the searchregion has a similar shape to the one proposed in [11], but ourstrategy is applied to the entire TZS execution instead of onlyto the Raster step. In the end, although not able to guaranteethe cost function minimization, the rate-constrained algorithmshould manage to eliminate most block distortion computa-tions while still evaluating the regions more likely to containoptimal candidates.
4. RESULTS AND DISCUSSION
In order to evaluate the elimination criterion proposed in theprevious section, the TZS algorithm implemented in VTM6.2 was modified to apply rate-based elimination. The ex-periments were performed using LDP and RA configurations,setting InternalBitDepth to 8 instead of its default of 10. Thevideo sequences analyzed are the Common Test Conditions(CTC) subset common to both HEVC [17] and VVC [18],and each sequence was encoded using Quantization Parame-ters (QPs) 22, 27, 32 and 37 in order to evaluate coding effi-ciency using the BD-Rate metric [19].In this paper, the complexity C of a video is defined as: C = (cid:88) s ∈ S totalCandidates ( s ) × area ( s ) (4)In (4), S is the set of all possible Coding Unit (CU) sizesin VVC, totalCandidates ( s ) represents the total number ofcandidates with size s for which distortion was calculated dur-ing IME and area ( s ) is the area of that CU size. Consideringthe original VTM IME complexity as C ori and the IME com-plexity of a modified implementation as C mod , the complex-ity reduction ∆ C is calculated as follows: ∆ C = C ori − C mod C ori × (5)This complexity metric, unlike time measurements, is not af-fected by compiler optimizations or machine specificationsand thus allows for reproducible experiments and fair com-parisons with other IME search algorithms.We initially fixed the elimination threshold at t = 4 asthis is the smallest value for which TZS will test candidatesother than the MVP. Fig. 4 shows our results. Looking atRA, even though most test sequences have BD-Rate increasesof less than 1%, these results show that t = 4 is too re-strictive in some cases. Therefore, given that a larger searchregion should result in lower BD-Rates as it eliminates lesscandidates, we conducted additional experiments with the se-quences that presented BD-Rate values above 1% while re-laxing the elimination criterion by using t equal to 10 and 20. Code is available at https://gitlab.com/baioc/vtm/tree/rate-elimination ig. 4 : Rate-based candidate elimination results using theoriginal TZS algorithm as a baseline. The chart includes all17 tested sequences for both RA and LDP configurations.Notably, t = 20 allows the effective search window (see Fig.1) to approximate that of the Octagonal-axis pattern presentedin [11], which is known to have near zero BD-Rate increases.Thus, with the purpose of comparing our results with thoseof [11], we have replicated their search pattern – originallyimplemented for HEVC – in VTM, adapting it to any searchwindow size and non-square dimensions.Table 1 shows the results obtained when increasing theelimination threshold in the RA configuration and comparesthem with those of the Octagonal-axis raster pattern. Apply-ing t = 10 suffices to decrease BD-Rate levels of three se-quences to less than 1%, maintaining complexity reductionsignificantly higher than that of Octagonal-axis. For the othervideos, t = 20 is required to further reduce BD-Rate andapproximate the results of Octagonal-axis, while still havingbetter complexity results.Meanwhile, the LDP configuration shows promising re-sults, with an overall BD-Rate below 1% for t = 4 , strength-ening the premise that good MV candidates can generally be Table 1 : Complexity reduction (%) and BD-Rate increase(%) when using different thresholds in the RA configuration.
Sequence t = 10 t = 20 Octagonal-axisBDBR ∆ C BDBR ∆ C BDBR ∆ C Cactus 0.55 39.0 0.12 28.8 0.02 26.2BballDrill 0.60 35.8 0.14 23.4 0.01 22.2BballDrillTxt 0.67 34.3 0.09 22.9 -0.04 21.2SlideEdit 1.23 10.0 0.63 6.8 0.03 6.1RHorses 1.53 38.0 0.35 25.1 0.02 22.5SShow 2.88 51.0 0.88 42.5 -0.05 36.4RHorsesC 3.02 47.7 0.70 33.9 0.10 30.2
Table 2 : Complexity reduction (%) and BD-Rate increase(%) per-class averages in the LDP configuration.Class t = 4 Octagonal-axisBDBR ∆ C BDBR ∆ C B 0.18 87.8 0.02 13.9C 0.22 88.8 0.00 15.2D 0.20 86.5 0.04 10.7E 0.04 82.3 -0.04 6.5F 3.44 87.3 0.37 16.4found within the vicinity of the MVP. Table 2 shows com-plexity reduction and BD-Rate results for LDP, averaged byclass. While the Octagonal-axis search reduces 16.4% of theIME complexity in the best case, our approach achieves com-plexity reductions of over 80% with negligible BD-Rate in-crease in most cases – the two screen-content sequences inclass F being the only exceptions.
5. CONCLUSIONS
In this paper, we have showed how MV bitrate can influenceIME search patterns and proposed an algorithm that explicitlyuses this relation to prune search regions through an efficientand simple criterion which is applied on a per-block basis.The main advantages of our proposal are its versatility, beingpossible to combine this solution with existing IME searchalgorithms; and its low complexity, given that the bitrate esti-mation function can be easily computed with lookup tables.Our experiments show that even when rate-based elimina-tion is applied on top of TZS with a fixed threshold of 4, weare able to reduce IME complexity by approximately 86.69%when using the LDP configuration, with a small (0.74% av-erage BD-Rate) coding efficiency loss. Although RA config-uration results displayed a lower complexity reduction and ahigher coding efficiency loss when using the same threshold,additional experiments showed that changing the threshold to20 allows BD-Rate increase to be kept below 1%, while stillable to reduce complexity slightly further than the Octagonal-axis raster search pattern. This exemplifies how the algorithmcan be configured for different trade-offs between complexityreduction and encoding efficiency.We reckon future works could propose modifications andimprovements over this technique, as well as study its useon top of other fast algorithms. For instance, the identifi-cation of specific corner case test sequences with increasedBD-Rate points towards adaptive thresholds to accommodatecontent-dependent motion characteristics. Finally, since asmall threshold value was able to produce satisfying resultsfor the LDP configuration, hardware-accelerated implemen-tations could apply a static elimination criterion ( e.g. overFS), to drastically simplify the IME search, reducing it to asmall region around the predictor in order to obtain significantreductions to circuit area, power and cost.4 . REFERENCES [1] Cisco, “Cisco Visual Networking Index:Forecast and Trends, 2017–2022 White Pa-per,” , 2019,Accessed: 20 Aug. 2019.[2] Amy Watson, “Consuming media at home dueto the coronavirus worldwide,” , 2020, Accessed:16 Jun. 2020.[3] I. Siqueira, G. Correa, and M. Grellert, “Rate-Distortionand Complexity Comparison of HEVC and VVC VideoEncoders,” in , Feb 2020, pp. 1–4.[4] Y. Zhang, C. Zhang, and R. Fan, “Fast Motion Esti-mation in HEVC Inter Coding: An Overview of RecentAdvances,” in , Nov 2018, pp. 49–56.[5] Hadi Amirpour, Mohammad Ghanbari, Antonio Pin-heiro, and Manuela Pereira, “Motion estimation withchessboard pattern prediction strategy,”
MultimediaTools and Applications , vol. 78, no. 15, pp. 21785–21804, 2019.[6] Shan Zhu and Kai-Kuang Ma, “A new diamond searchalgorithm for fast block-matching motion estimation,”
IEEE Transactions on Image Processing , vol. 9, no. 2,pp. 287–290, Feb 2000.[7] C. Rosewarne, K. Sharman, R. Sj¨oberg, and G. J.Sullivan, “High Efficiency Video Coding (HEVC)Test Model 16 (HM 16) Encoder Description Update12,” Document JCTVC-AK1002, JCT-VC of ITU-T,Geneva, Oct 2019.[8] J. Chen, Y. Ye, and S. Kim, “Algorithm description forVersatile Video Coding and Test Model 6 (VTM 6),”Document JVET-O2002, JVET of ITU-T, Gothenburg,Jul 2019.[9] Nghia Doan, Tae Sung Kim, Chae Eun Rhee, and Hyuk-Jae Lee, “A hardware-oriented concurrent TZ search al-gorithm for High-Efficiency Video Coding,”
EURASIP Journal on Advances in Signal Processing , pp. 1–17,Nov 2017.[10] R. Fan, Y. Zhang, and B. Li, “Motion Classification-Based Fast Motion Estimation for High-EfficiencyVideo Coding,”
IEEE Transactions on Multimedia , vol.19, no. 5, pp. 893–907, May 2017.[11] P. Goncalves, M. Porto, B. Zatt, L. Agostini, and G. Cor-rea, “Octagonal-Axis Raster Pattern for Improved TestZone Search Motion Estimation,” in , Apr 2018, pp. 1763–1767.[12] L. Trudeau, S. Coulombe, and C. Desrosiers, “Cost-Based Search Ordering for Rate-Constrained MotionEstimation Applied to HEVC,”
IEEE Transactions onBroadcasting , vol. 64, no. 4, pp. 922–932, Dec 2018.[13] W. Li and E. Salari, “Successive elimination algorithmfor motion estimation,”
IEEE Transactions on ImageProcessing , vol. 4, no. 1, pp. 105–107, Jan 1995.[14] G. J. Sullivan and T. Wiegand, “Rate-distortion opti-mization for video compression,”
IEEE Signal Process-ing Magazine , vol. 15, no. 6, pp. 74–90, 1998.[15] M. Z. Coban and R. M. Mersereau, “A fast exhaus-tive search algorithm for rate-constrained motion esti-mation,”
IEEE Transactions on Image Processing , vol.7, no. 5, pp. 769–773, May 1998.[16] P. Goncalves, G. Correa, M. Porto, B. Zatt, and L. Agos-tini, “Multiple early-termination scheme for TZ searchalgorithm based on data mining and decision trees,” in2017 IEEE 19th International Workshop on MultimediaSignal Processing (MMSP)