PPerformance of AV1 Real-Time Mode st Ludovic Roux
CoSMo Software
[email protected] nd Alexandre Gouaillard
CoSMo Software
Abstract —With COVID-19, the interest for digital interactionshas raised, putting in turn real-time (or low-latency) codecs intoa new light. Most of the codec research has been traditionallyfocusing on coding efficiency, while very little literature exist onreal-time codecs. It is shown how the speed at which contentis made available impacts both latency and throughput. Theauthors introduce a new test set up, integrating a paced reader,which allows to run codec in the same condition as real-timemedia capture. Quality measurements using VMAF, as well asmultiple speed measurements are made on encoding of HD andfull HD video sequences, both at 25 fps and 50 fps to comparethe respective performances of several implementations of theH.264, H.265, VP8, VP9 and AV1 codecs.
Index Terms —Real-time video codecs, video encoding perfor-mance
I. I
NTRODUCTION
A. Codec Use Cases
The term codec usually refers to algorithms that encode,respectively decode, binary representation of media.Currently there are arguably three major use cases for large-scale codec usage: • encoding of original raw content, client side, • trans-coding of content, server-side, • decoding of content on the receiving side.Codecs are usually evaluated based on coding efficiency [1][2], a.k.a compression/quality ratio achieved against run time.Comparisons are made using the Bjøntegaard rate difference(BD-rate) [3] [4] and multiple papers exist to guide researcherschoosing the most representative dataset, the most meaningfulmetric, and the best representation of results [5].For example, the Xiph.org Foundation has developed thevery extensive AreWeCompressedYet [6] automatic serviceto enable comparisons between different implementations ofvideo codecs using various metrics.With COVID-19 and the new normal, people have a newappetite for more interactive uses of streaming, to get thesame experience and value they were enjoying in real-life.Correspondingly, it has increased the demand for faster-than-live streaming, “live” being 5 seconds behind real-time, toreach the less-than 500 ms. level of latency where human inter-actions thrive. In parallel, the rise of cloud gaming, augmentedreality (AR) and virtual reality (VR) have been pushing videocodecs in the same direction, albeit with generated content instead of natural content and even higher expectation forlatency (150 ms. max) [7].In this paper we focus on the specific use case of real-time consumption of media for interactive applications. Thisuse case puts a specific emphasis on latency, making codingefficiency a secondary target. B. Pre-Recorded Content Encoding
The encoding throughput is the total number of frames ofthe input divided by the total duration of the encoding process.The encoding latency is the time it takes for a single frame togo through the encoding process.When the content is readily available, the encoding processcan be distributed, reducing the encoding time. The total runtime used in coding efficiency computation includes bothlatency and processing time, diluting out latency. Latencyshould be reported separately to be comparable.In Video-on-Demand (VOD), latency is usually negligiblefor both encoder and decoder compared to the encodingtime. Latency is more than often not taken into account,and the main parameter of the encoder will be the speed(1 to 8), which represent a coding efficiency vs complexitycompromise. Implicitly, the quality follows the speed, althoughnot linearly.As such, the delay or latency induced by the encodingand decoding processes are almost never taken in account.Actually, assuming latency is negligeable, and the entire mediais available, many encoders use a 2-pass approach to gainmore efficiency per file/movie. Netflix is known to pioneerdistributed “per-chunks,” “per-shots” techniques investing timeand resources in the encoding process in exchange for maxi-mum efficiency [8].The most recent codecs performance study [9] comparedlibaom AV1 encoder against x265 and libvpx-vp9 using theirbest quality mode and a two-pass compression.
C. Live Content Capture Speed
A big difference between recorded content and real-timecontent is the speed at which frames become available andcorresponding impact on latency. For example, real-time orlive media needs to be encoded in one-pass and not in twoor more passes like algorithms focused on coding efficiencywould. a r X i v : . [ c s . MM ] N ov MAF MOS ACR label Error visibility100 5 excellent Imperceptible80 4 good Perceptible error, not annoying60 3 fair Visible error, slightly annoying40 2 poor Visible error, annoying20 1 bad Visible error, very annoying
TABLE I: Mapping of VMAF scores to MOS values
1) Real-time encoder cannot encode faster than real-time:
Even if the encoder was able to encode one frame at a time,at a faster-than-real-time speed, you still need to wait on thecapturer to provide the next frame before the encoder canprocess it.
2) Real-time encoder latency is correlated with bufferdepth:
Let’s take the use case where you have a 60 framesdeep frame buffer. With live content, you need to wait untilthose frames are generated before you can start any kindof processing. At a capture rate of 30 fps, one must wait2 seconds before the above buffer is ready to be used, even ifthe encoder can then encode faster than 30 fps.To be able to compare traditional codec (default settings),and real-time modes of certain codecs implementation underthe same condition, we need to make sure that the frames arebeing delivered to the encoder at real-time speed even if it ispre-recorded content.
D. Codecs Performance in Real-Time Use Cases
The run time is latency plus encoding time, and the latencyis a function of both the depth of any frame buffer and theconstant capture rate. Increasing the frame rate reduces latencybut increases the work load. The easiest way to reduce thelatency is to reduce the size of the frame buffer, or to removethe need for a frame buffer altogether.This is not to be confused with encoders speed setting. Mostrecent codecs design involves sub algorithms, referred to as“tools.” Some tools are more demanding in term of complexity,or latency than others, and not all have the same impact onefficiency. As described in [10] figure 1, the motion estimationis the dominant contributor to run time budget in encoding.The speed settings maps to the choice of some specificsubset defining a certain trade off between coding efficiencyand encoding speed (possibly disregarding completely thelatency). Some other codecs also have an explicit real-timemode, settings that relate to coding/efficiency – latency tradeoff.This comes at a cost in terms of coding efficiency, andmakes the real-time mode of codecs not directly comparablewith rate-compression graphs. There is no study to date abouthow much coding efficiency you trade off for decreasedlatency.In this paper, we will concentrate on the real-time modeof some of the encoder implementations for H.264 (AVC),VP8, VP9, and AV1 which are all used in the webrtc.org codetoday. An implementation of H.265 (HEVC) will be also usedfor comparison. (a) Blue sky (BS25) (b) Pedestrian area (PA25)(c) Riverbed (RB25) (d) Rush hour (RH25)(e) Station2 (ST25) (f) Sunflower (SF25)(g) Tractor (TR25) (h) Crowd run (CR50)(i) Ducks take off (DT50) (j) In to tree (IT50)(k) Old town cross (OT50) (l) Park joy (PJ50)
Fig. 1: Snapshots of each video sequencesII. M
ETHODOLOGY
A. Quality Metric
Compression can be objectively evaluated. Quality of videoframes can also be evaluated objectively, however it has beenshown that scores provided by objective metrics like PNSR(Peak Signal to Noise Ratio) [11] poorly correlate with humanevaluation of image or video quality [12]. To address thisdrawback, subjective metrics like VMAF (Video MultimethodAssessment Fusion) [13], [14] were introduced. A recent ilename Short name fps Duration
TABLE II: List of 12 video sequences used in this studystudy on the evaluation of objective video quality metricshas demonstrated a good correlation between subjective scoresgiven by humans and VMAF scores [15]. We have followedthe latest trend in codec research and used VMAF in this study.To give an interpretation of a VMAF score, one can relateit to the typical Mean Opinion Score (MOS) value rangingfrom 1 to 5. A very common rating scale for MOS is theAbsolute Category Rating (ACR) methodology [16]: “bad,”“poor,” “fair,” “good” and “excellent.” VMAF gives a score inthe range [0 , . VMAF score 20 can be mapped to “bad,”score 40 to “poor,” score 60 to “fair,” score 80 to “good,”and score 100 to “excellent” [17]. Table I gives a syntheticrelationship between VMAF scores and MOS values. B. Datasets
For easier comparison of results, we have used videosequences having the same resolution, the same color spaceand the same bit depth. Only the duration or the frame rate ofvideo sequences is different from one video to another.We have focused on 1080p HD video. This is the resolutionrecommended to compute VMAF scores using the defaultmodel v0.6.1 [17]. Table II gives the list of the 12 videosused in our study.All the video sequences use YUV format, 8 bits depth andare not compressed. They have been selected from the pub-licly available Xiph.org Video Test Media [derf’s collection]dataset. There are two groups of videos: a group of 7 videos havinga frame rate of 25 fps, and a group of 5 videos with a framerate of 50 fps. Fig. 1 shows a snapshot of each video.
C. Video Codecs
We will compare the performance of eight encoders, namelyaomenc (default and real-time settings) and SvtAv1EncApp forAV1, vpxenc for VP8 and VP9, x265 for HEVC, x264 andh264enc for H.264, compiled in their real-time mode whenavailable, and using speed 8 when applicable, using variousbitrate targets.Table III gives for each codec all the information needed toreproduce the results: the version we have used, where to get Xiph.org Video Test Media [derf’s collection] https://media.xiph.org/video/derf/ their source code and which options we selected to compilethem.To give an insight of encoding performance differ-ence between real-time mode and non real-time mode,only for AV1, we have compiled a second versionof AOM encoder aomenc compiled using the option -DCONFIG_REALTIME_ONLY=0 . We have selected speedoption --cpu-used=3 .For the real-time version of aomenc, we have selectedthe highest speed option --cpu-used=8 . We will refer toaomenc in real-time mode as aomenc-rt (encoder called withoption --rt , and to aomenc not in real-time mode as aomenc-good (encoder called with good quality option --good ).We also selected the highest speed option --preset 8 for SVT-AV1.We selected the setting --preset superfast for x265,and the setting --preset medium for x264. In both case,it was enough to reach the max 50 fps target.The options used at run time to launch each codec are givenin Table IV.Compilation of encoders and encoding of videos have beenperformed on a Dell™ OptiPlex 5050 with processor Intel®Core™ i7-7700T 8 cores at 2.90 GHz and 16 GB memoryrunning Ubuntu Desktop 20.04.1 64 bits operating system.
D. Real-Time Encoding Evaluation Process
The evaluation of a video encoder is usually performed byletting the encoder to read a video file to be encoded. Thisis not a realistic mode of operation for real-time where theframes to be encoded are only available after some delay. Forexample, when encoding frames received from a camera at arate of 25 fps, each new frame is available only after a delayof 1/25 second, that is 40 ms. Even if the encoder is able toencode a frame in 5 ms, it has to wait until the next framebecomes available. We are starving the encoder.To evaluate the encoders in a more realistic process, weintroduce a pacer program between the video file and theencoder (see Fig. 2). The objective of the pacer is to deliverframes to the encoder only at a selected frame rate.A raw video file to be encoded is read by the pacer program.The pacer is in charge of simulating the real-time delivery ofimages at a selected rate. When the pacer reads frames froma 25 fps video file, it outputs a frame every 1/25th second. odec Encoder and version Source code Configuration optionsAV1 libaom 2.0.0 https://aomedia.googlesource.com/aom/ cmake -DCMAKE_BUILD_TYPE=Release-DCONFIG_MULTITHREAD=1 -DCONFIG_PIC=1-DCONFIG_REALTIME_ONLY=1-DCONFIG_RUNTIME_CPU_DETECT=1-DCONFIG_WEBM_IO=0
AV1 SVT-AV1 0.8.4 https://github.com/OpenVisualCloud/SVT-AV1 build.sh release
VP8, VP9 libvpx 1.9.0 https://chromium.googlesource.com/webm/libvpx configure --enable-pic--enable-realtime-only--enable-multi-res-encoding--disable-debug --cpu=x86-64
H.265 x265 release 3.5 https://github.com/videolan/x265 cmake -DCMAKE_BUILD_TYPE=Release
H.264 x264 stable cde9a933 https://code.videolan.org/videolan/x264 make
H.264 openh264 2.1.1 https://github.com/cisco/openh264 make OS=linux ARCH=x86_64
TABLE III: Video encoders used in this study
Codec Encoder command optionsaomenc rt aomenc --codec=av1 --profile=0 --kf-max-dist=90000 --end-usage=cbr --min-q=1 --max-q=63--undershoot-pct=50 --overshoot-pct=50 --buf-sz=1000 --buf-initial-sz=500 --buf-optimal-sz=600--max-intra-rate=300 --passes=1 --rt --lag-in-frames=0 --error-resilient=0 --tile-columns=0--aq-mode=3 --enable-obmc=0 --enable-global-motion=0 --enable-warped-motion=0 --deltaq-mode=0--enable-tpl-model=0 --mode-cost-upd-freq=2 --coeff-cost-upd-freq=2 --enable-ref-frame-mvs=0--mv-cost-upd-freq=3 --enable-order-hint=0 --cpu-used=8 --threads=8 --end-usage=cbr--target-bitrate=xxx --fps=25/1 aomenc good aomenc --codec=av1 --good --passes=1 --cpu-used=3 --threads=8 --lag-in-frames=25 --min-q=0--max-q=63 --auto-alt-ref=1 --kf-max-dist=150 --kf-min-dist=0 --drop-frame=0 --static-thresh=0--arnr-maxframes=7 --arnr-strength=5 --sharpness=0 --undershoot-pct=100 --overshoot-pct=100--frame-parallel=0 --tile-columns=0 --profile=0 --target-bitrate=xxx --fps=25/1
SVT-AV1
SvtAv1EncApp --tbr xxx --fps 25 --preset 8 --pred-struct 0 --profile 0 --rc 2 --min-qp 1--max-qp 63 --vbv-bufsize 1 --tile-columns 0 --enable-global-motion 1 --enable-local-warp 0--adaptive-quantization 0
VP8 vpxenc --codec=vp8 --lag-in-frames=0 --error-resilient=0 --kf-max-dist=90000 --static-thresh=0--end-usage=cbr --undershoot-pct=50 --overshoot-pct=50 --buf-sz=1000 --buf-initial-sz=500--buf-optimal-sz=600 --max-intra-rate=300 --resize-allowed=0 --drop-frame=0 --passes=1 --rt--noise-sensitivity=0 --cpu-used=-6 --threads=8 --min-q=1 --max-q=63 --screen-content-mode=0--target-bitrate=xxx --fps=25/1
VP9 vpxenc --codec=vp9 --lag-in-frames=0 --error-resilient=0 --kf-max-dist=90000 --static-thresh=0--end-usage=cbr --undershoot-pct=50 --overshoot-pct=50 --buf-sz=1000 --buf-initial-sz=500--buf-optimal-sz=600 --max-intra-rate=300 --resize-allowed=0 --drop-frame=0 --passes=1--rt --noise-sensitivity=0 --cpu-used=7 --threads=8 --profile=0 --min-q=1 --max-q=63--tile-columns=0 --aq-mode=3 --target-bitrate=xxx --fps=25/1 x265 x265 --frame-threads 0 --preset superfast --bitrate xxx --fps 25 x264 x264 --preset medium --bitrate xxx --fps 25 --demuxer raw openh264 h264enc -rc 0 -complexity 2 -denois 0 -scene 0 -bgd 0 -fs 0 -numl 1 -tarb 0 xxx -frout 0 25
TABLE IV: Options used for encoders at run time (encoding has been performed at ten target bitrates: 800, 900, 1000, 1250,1500, 1750, 2000, 2500, 5000 and 10000 kbps)For a 50 fps video file, the pacer will output a frame every1/50th second. Frames delivered by the pacer at the selectedrate are written to a Unix pipe. The video encoder being testedreads frames from the Unix pipe. The output of the encoder isa bitstream corresponding to the encoded video file. At last,we evaluate the quality of the encoded video by using VMAFvideo quality assessment tool.III. R
ESULTS AND A NALYSIS
We measured the VMAF scores for each encoder at tendifferent target bitrates: 800, 900, 1000, 1250, 1500, 1750,2000, 2500, 5000 and 10000 kbps (see VMAF graphs inFigures 4a to 4l), and computed the BD-rates from the VMAFcurves according to bitrate (see Tables V and VI).
A. Pacer: Latency Impact of Real-Time Streams
We measured the encoding throughput first without usingthe pacer (the encoder reads the frames directly from the video
Video openh264 x264 VP8 VP9 x265 SVTBS25 − . − . − . − .
49 7 .
63 3 . PA25 − . − . − . − .
91 16 .
10 28 . RB25 − . − . − . − . − .
97 17 . RH25 − . − . − . − .
93 16 .
22 18 . ST25 − . − . − . − .
50 40 .
99 33 . SF25 − . − . − . − .
69 34 .
49 34 . TR25 − . − . − . − .
16 20 .
84 31 . Avg 25 − . − . − . − .
24 19 .
33 24 . CR50 − . − . − . − . − .
65 8 . DT50 − . − . − . − .
33 22 .
54 39 . IT50 − . − . − . − .
31 7 .
52 19 . OT50 − . − . − . − .
95 12 .
56 22 . PJ50 − . − . − . − .
78 14 .
10 39 . Avg 50 − . − . − . − .
20 11 .
01 25 . TABLE V: BD-rate for aomenc-rt8 (Note: SVT = SVT-AV1)file), which, supposing the encoding speed is limited by theencoder but not by the I/O speed, provides the maximum speedat which the encoder can operate given the input content,ig. 2: Evaluation of video encoder in real-time mode
Video openh264 x264 VP8 VP9 x265 SVTBS25 .
61 1 .
36 3 .
90 0 . − . − . PA25 .
70 4 .
41 6 .
62 1 . − . − . RB25 .
88 14 .
91 13 .
17 1 .
21 0 . − . RH25 .
59 4 .
51 4 .
30 0 . − . − . ST25 .
63 0 .
61 3 .
98 1 . − . − . SF25 .
11 0 .
34 2 .
88 1 . − . − . TR25 .
35 3 .
75 4 .
14 0 . − . − . Avg 25 .
27 4 .
27 5 .
57 1 . − . − . CR50 .
60 8 .
06 12 .
78 3 .
94 0 . − . DT50 .
06 5 .
81 9 .
99 3 . − . − . IT50 .
77 5 .
02 4 .
85 1 . − . − . OT50 .
64 3 .
03 7 .
11 1 . − . − . PJ50 .
70 1 .
75 8 .
33 3 . − . − . Avg 50 .
55 4 .
73 8 .
61 2 . − . − . TABLE VI: BD-VMAF for aomenc-rt8 (Note: SVT = SVT-AV1)the bitrate target, and the hardware. We then measured thethroughput using the pacer.Fig. 3a and 3b show the encoding throughput achieved bythe encoders when they are not limited by the pacer. At thethreshold bitrate of 2500 kbps or less, all the encoders areable to deliver 25 fps or more on our test computer. exceptSVT-AV1 which at best reach only 5.3 fps.It is interesting to note that, when requested to encode fora rate of 50 fps instead of 25 fps, all the encoders perform theircompression faster, although three of the five videos in the50 fps group are rather difficult to be encoded. This increasein compression speed allows x264 and x265 to reach 50 fps atthe threshold bitrate of 2500 kbps, while VP9 and aomenc-rtare not able to deliver more than 30 fps.Generally, the encoders work at a more homogeneous speedat higher bitrates as shown by the shorter standard deviationbars, while the measured frame rate varies much more forlower bitrates. In fact, at low bitrates, the encoders processvideos easy to encode (Blue-sky, Pedestrian-area, Rush-hour,Station2, Sunflower, Tractor, In-to-tree, Old-town0cross) muchfaster than videos harder to encode (Riverbed, Crowd-run,Ducks-take-off, Park-run). (a) Average encoding speed of theseven 25 fps videos without pacer (b) Average encoding speed of thefive 50 fps videos without pacer(c) Average encoding speed of theseven 25 fps videos with pacer (d) Average encoding speed of thefive 50 fps videos with pacer
Fig. 3: Average encoding speed of 1080p videos on a Dell™OptiPlex 5050 with processor Intel® Core™ i7-7700T 8 coresat 2.90 GHz and 16 GB memory running Ubuntu Desktop20.04.1 64 bits operating system.Apart from the settings, this is the conditions under whichmost codec studies are being done, and performances reported.One can see that in the case of x264 and x265, the latencyis the bottleneck of the throughput and not the encoding speedas it can encode 50 fps media content faster than 25 fps.Supposing a constant I/O speed, reading a frame (latency) isconstant whether the media content is originally captured at25 or 50 fps, however, the complexity (linear to the numberof pixels to encode) per second is twice as much.This is the opposite for libvpx, for which we can clearly seethat encoding 50 fps content is slower than encoding 25 fpswhether with VP8 or VP9.Introducing the pacer, we measured the frame rates as shownin Fig. 3c and 3d.All the encoders which could deliver more than 25 fps inthe previous experiment have no problem delivering with apaced input. SVT-AV1 remains way too slow for real-time.All the encoders which could deliver more than 50fps inthe previous experiment, like VP8 and x264 here again haveno problem. Interestingly x265 is performing extremely well.It is also interesting to note that aomenc-rt is slightly fasterthan VP9, while one would expect the opposite. Right nowwe can conclude that there is no CPU penalty when movingfrom using VP9 to using AV1 real-time.
B. Interpretation of BD-rate and BD-VMAF
A BD-rate is a measure of the average percentage bitratesavings that can be obtained for the same visual quality level.This measure is computed over the range of quality levels thatare common to two curves.or example, let consider VMAF scores on Fig. 4i forDucks-take-off (DT50) video. We want to compute the bitratesavings at same VMAF level of aomenc-rt8 (blue curve,VMAF range from 27 to 61) as compared to x264 (orangecurve, VMAF range from 6 to 55). The common VMAF rangefor these two curves is 27 to 55. Using that common qualityrange, we compute the average bitrate savings by calculatingthe area between the curves (to the left of the x264 orangecurve and to the right of the aomenc-rt8 blue curve), and wedivide it by the area to the left of the aomenc-rt8 blue curveup to the right of the y-axis. We get a BD-rate of − . as shown in Table V. It is a negative value, which meansthat there is a reduction in bitrate for aomenc-rt8 as comparedto x264. The interpretation of this value is that for the sameVMAF score, we may expect that aomenc-rt8 gives in averagea 21.68% bitrate savings as compared to x264.Similarly, we can compute the average visual quality im-provement for the same bitrate between aomenc-rt8 and x264by switching the variables. Using the same example, we lookfor the common bitrate range between aomenc-rt8 (blue curve,bitrate range from 2160 to 9925 kbps) and x264 (orangecurve, bitrate range from 810 to 10000 kbps). The commonbitrate range for these two curves is 2160 to 9925 kbps. Usingthat common bitrate range, we compute the average VMAFimprovement by calculating the area between the curves (tothe bottom of aomenc-rt8 blue curve and to the top of x264orange curve), and we divide it by the area to the bottom ofthe x264 orange curve down to the top of the x-axis. We geta BD-VMAF of . as shown in Table VI. It is a positivevalue, which means that there is an increase in VMAF scorefor aomenc-rt8 as compared to x264. The interpretation of thisvalue is that for the same bitrate, we may expect that aomenc-rt8 gives in average a VMAF score . points higher thanx264. C. Discussion
VMAF scores are reported in Fig. 4a to 4l. We havehighlighted a threshold of 2500 kbps for bitrate as thisvalue is known to be a hard-coded maximum for WebRTCwithin Chromium browser, although native applications usingWebRTC are not concerned by such a limitation.
1) 25 fps datasets:
VMAF scores show that all the videosof the 25 fps group except Riverbed are relatively easy toencode. The curves are close or very close to the perfect scoreof 100 for most bitrates. It is only at low bitrate of roughly1000 kbps or lower that VMAF score gets lower than 80.The quality of videos encoded by openh264 is worsening athigher bitrates and faster than for the other encoders. VP8 isalso showing a slightly lower quality of videos than the otherencoders. At the threshold of 2500 kbps bitrate, the qualityof encoded videos is excellent or good, above VMAF scoreof 80.Riverbed is the only video of the 25 fps group being difficultto be encoded with good quality. Although this video is static,waves on surface of water and reflection of light on the wavesare challenging for encoders. Openh264 and VP8 are not able to encode this video at the threshold bitrate of 2500 kbps.The lowest bitrate delivered by openh264 for Riverbed is3760 kbps, while it is 3460 kbps for VP8. VMAF rating atthe threshold bitrate is only between 30 and 50, which is poorquality.
2) 50 fps datasets:
The group of 50 fps videos was moredifficult to encode with good quality than the group of 25 fpsvideos. Ducks-take-off was the most challenging video tobe encoded. Interestingly, like Riverbed, it is a static videoshowing water with waves.At the threshold bitrate of 2500 kbps, VMAf score is lowerthan 80 for the videos in the group of 50 fps, except for videoOld-town-cross which has a very slow motion.We notice again that openh264 provides noticeable lowerVMAF scores than the other encoders. There are three videos,Crowd-run, Ducks-take-off and Park-joy, which encodersopenh264 and VP8 are unable to encode at the threshold bitrateof 2500 kbps. They would require a target bitrate of 6000 kbpsor more.The other encoders, aomenc-rt8, SVT-AV1, x265, x264 andVP9, are able to encode all the twelve videos with a rathersimilar quality. The quality of videos encoded by SVT-AV1 isalways the best, x265 comes second and aomenc-rt8 is third.From Table VI giving BD-VMAF for aomenc-rt8, we canget the ranking of the encoders relatively to aomenc-rt8. Wecan see that for the same bitrate, VMAF score of SVT-AV1 > x265 > aomenc-rt8 > VP9 > x264 > VP8 > openh264.Among the two encoders studied for AV1, SVT-AV1 pro-vides a much better coding efficiency than aomenc-rt real-time.However, SVT-AV1 is lacking an efficient real-time mode. Ittook SVT-AV1 about 7 times as long as aomenc-rt to encodethe same video clips, although both were run using the samespeed of 8. IV. CONCLUSION
A. Latency
Pre-recorded content provide encoders with the capacityto fill up buffers to increase the coding efficiency withoutincreasing the latency too much. The same buffers which arefilled at I/O speed for pre-recorded content need to wait forframe to be acquired in live, real-time and interactive use case,making any operation that requires frame buffers prohibitive.Very often, benchmarks are only provided for pre-recordedcontent, and cannot be directly translated into the real-timeconfiguration. One of the contribution of this paper is a processto compute fair performance comparison of encoders in areal-time situation, while still using standard files as input.We think it’s going to help extend existing test beds to beable to better assess performances of all codecs in what hasbecome a much more important use case.
B. Coding Efficiency
It is outside the scope of this paper, but it is interesting tonote that SVT-AV1 at speed 8 has a coding efficiency similaror better than aomenc-good at speed 3. a) Blue sky (25 fps) BS25 (b) Pedestrian area (25 fps) PA25 (c) Riverbed (25 fps) RB25(d) Rush hour (25 fps) RH25 (e) Station2 (25 fps) ST25 (f) Sunflower (25 fps) SF25(g) Tractor (25 fps) TR25 (h) Crowd run (50 fps) CR50 (i) Ducks take off (50 fps) DT50(j) In to tree (50 fps) IT50 (k) Old town cross (50 fps) OT50 (l) Park joy (50 fps) PJ50
Fig. 4: VMAF scores according to bitratehe non-realtime version of aomenc has a better codingefficiency than the real-time version. The real-time version ofAOM seems to be about 33% less efficient. While theoreticallyinteresting, this has but little practical interest, since the defaultaomenc encoder, even at its maximum setting of speed 6 willhave too much latency to be used in interactive case. Theinteresting question is: what do I gain or lose when switchingfrom one real-time codec to another, and which ones canachieve the lowest latency.We find that the real-time modes of the codecs are generallyperforming relatively to each other as they would with theirnon real-time version, i.e AV1 better than VP9 better thanVP8, and HEVC better than AVC. The exception is HEVCwhich has a better coding efficiency than aomenc-rt real-time,while in the non-real time mode, it is aomenc that is reportedto have a better coding efficency than HEVC [9].x265 exhibited excellent coding efficiency. It is able toencode 1080p videos in real-time at 50 fps. It looks like itsspeed settings has also very good (low) latency. The authorsregret that the licensing situation of HEVC is still complicated.We have shown that one can expect an average of 11% lessbandwidth usage for the same video quality with aomenc-rtthan with VP9-rt, 37% less than with VP8.The x264 implementation of H.264 is more or less often inpar with VP8 for quality, while the openh264 implementationof H.264 exhibits generally lower coding efficiency on the12 video clips of this study. This illustrates that encoderimplementations of the same codec can vary a lot in quality,and one should compare implementations and not codecsdirectly.Codec implementations improve with time, so it is likelythat the sitll young implementation of AV1 codecs will im-prove with time. V. F
UTURE W ORK
This is the first attempt at comparing several real-time ver-sions of encoders, so there is a lot of room for improvement.One obvious way to improve the results is to test on moredatasets, which could e.g. include different types of content(high-motion videos, animation, live video games), differentresolutions (480p a.k.a. DVD size, 4k, 8k). We think there isstill a lot of work to be done to directly measure latency ofencoders.There are many more implementation of codecs out there,and testing different implementations, including hardware im-plementation would add significant value for readers.A
CKNOWLEDGEMENT
The authors would like to thank Andrew Johnson, CTO ofNira Inc., Ioannis Katsavounidis from Netflix, and Googlecollaborators (Marco Paniconi, Fyodor Kyslov, MichaelHorowitz, Yaowu Xu, Jerome Jiang, Danil Chapovalov, ...)for their valuable help, suggestions and comments. R
EFERENCES[1] J. De Cock, A. Mavlankar, A. Moorthy, and A. Aaron, “A large-scalevideo codec comparison of x264, x265 and libvpx for practical VODapplications,” in
Applications of Digital Image Processing XXXIX , vol.SPIE 9971, September 2016.[2] P. Akyazi and T. Ebrahimi, “Comparison of compression efficiencybetween HEVC/H.265, VP9 and AV1 based on subjective qualityassessments,” in
International Conference on Quality of MultimediaExperience (QoMEX)
APSIPA Transactionson Signal and Information Processing , vol. 9, February 2020.[10] K. Yu, J. Lu, J. Li, and S. Li, “Practical real-time video codec for mobiledevices,” in
International Conference on Multimedia and Expo (ICME) .IEEE, July 2003, pp. 509–512.[11] American National Standards Institute,
Objective video quality measure-ment using a peak-signal-to-noise-ratio (PSNR) full reference technique ,American National Standards Institute, Ad Hoc Group on Video QualityMetrics, 2001.[12] H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluationof recent full reference image quality assessment algorithms,”
IEEETransactions on Image Processing , vol. 15, pp. 3440–3451, November2006.[13] Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara,“Toward a practical perceptual video quality metric,” Tech.Rep., June 2016. [Online]. Available: https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652[14] Netflix. VMAF – Video Multi-Method Assessment Fusion. [Online].Available: https://github.com/Netflix/vmaf[15] C. Lee, S. Woo, S. Baek, J. Han, J. Chae, and J. Rim, “Comparison ofobjective quality models for adaptive bit-streaming services,” in