[PDF] Performance of AV1 Real-Time Mode

Abstract

With COVID-19, the interest for digital interactions has raised, putting in turn real-time (or low-latency) codecs into a new light. Most of the codec research has been traditionally focusing on coding efficiency, while very little literature exist on real-time codecs. It is shown how the speed at which content is made available impacts both latency and throughput. The authors introduce a new test set up, integrating a paced reader, which allows to run codec in the same condition as real-time media capture. Quality measurements using VMAF, as well as multiple speed measurements are made on encoding of HD and full HD video sequences, both at 25 fps and 50 fps to compare the respective performances of several implementations of the H.264, H.265, VP8, VP9 and AV1 codecs.

Full PDF

PPerformance of AV1 Real-Time Mode st Ludovic Roux

CoSMo Software

[email protected] nd Alexandre Gouaillard

CoSMo Software

[email protected]

Abstract —With COVID-19, the interest for digital interactionshas raised, putting in turn real-time (or low-latency) codecs intoa new light. Most of the codec research has been traditionallyfocusing on coding efﬁciency, while very little literature exist onreal-time codecs. It is shown how the speed at which contentis made available impacts both latency and throughput. Theauthors introduce a new test set up, integrating a paced reader,which allows to run codec in the same condition as real-timemedia capture. Quality measurements using VMAF, as well asmultiple speed measurements are made on encoding of HD andfull HD video sequences, both at 25 fps and 50 fps to comparethe respective performances of several implementations of theH.264, H.265, VP8, VP9 and AV1 codecs.

Index Terms —Real-time video codecs, video encoding perfor-mance

I. I

NTRODUCTION

A. Codec Use Cases

The term codec usually refers to algorithms that encode,respectively decode, binary representation of media.Currently there are arguably three major use cases for large-scale codec usage: • encoding of original raw content, client side, • trans-coding of content, server-side, • decoding of content on the receiving side.Codecs are usually evaluated based on coding efﬁciency [1][2], a.k.a compression/quality ratio achieved against run time.Comparisons are made using the Bjøntegaard rate difference(BD-rate) [3] [4] and multiple papers exist to guide researcherschoosing the most representative dataset, the most meaningfulmetric, and the best representation of results [5].For example, the Xiph.org Foundation has developed thevery extensive AreWeCompressedYet [6] automatic serviceto enable comparisons between different implementations ofvideo codecs using various metrics.With COVID-19 and the new normal, people have a newappetite for more interactive uses of streaming, to get thesame experience and value they were enjoying in real-life.Correspondingly, it has increased the demand for faster-than-live streaming, “live” being 5 seconds behind real-time, toreach the less-than 500 ms. level of latency where human inter-actions thrive. In parallel, the rise of cloud gaming, augmentedreality (AR) and virtual reality (VR) have been pushing videocodecs in the same direction, albeit with generated content instead of natural content and even higher expectation forlatency (150 ms. max) [7].In this paper we focus on the speciﬁc use case of real-time consumption of media for interactive applications. Thisuse case puts a speciﬁc emphasis on latency, making codingefﬁciency a secondary target. B. Pre-Recorded Content Encoding

The encoding throughput is the total number of frames ofthe input divided by the total duration of the encoding process.The encoding latency is the time it takes for a single frame togo through the encoding process.When the content is readily available, the encoding processcan be distributed, reducing the encoding time. The total runtime used in coding efﬁciency computation includes bothlatency and processing time, diluting out latency. Latencyshould be reported separately to be comparable.In Video-on-Demand (VOD), latency is usually negligiblefor both encoder and decoder compared to the encodingtime. Latency is more than often not taken into account,and the main parameter of the encoder will be the speed(1 to 8), which represent a coding efﬁciency vs complexitycompromise. Implicitly, the quality follows the speed, althoughnot linearly.As such, the delay or latency induced by the encodingand decoding processes are almost never taken in account.Actually, assuming latency is negligeable, and the entire mediais available, many encoders use a 2-pass approach to gainmore efﬁciency per ﬁle/movie. Netﬂix is known to pioneerdistributed “per-chunks,” “per-shots” techniques investing timeand resources in the encoding process in exchange for maxi-mum efﬁciency [8].The most recent codecs performance study [9] comparedlibaom AV1 encoder against x265 and libvpx-vp9 using theirbest quality mode and a two-pass compression.

C. Live Content Capture Speed

A big difference between recorded content and real-timecontent is the speed at which frames become available andcorresponding impact on latency. For example, real-time orlive media needs to be encoded in one-pass and not in twoor more passes like algorithms focused on coding efﬁciencywould. a r X i v : . [ c s . MM ] N ov MAF MOS ACR label Error visibility100 5 excellent Imperceptible80 4 good Perceptible error, not annoying60 3 fair Visible error, slightly annoying40 2 poor Visible error, annoying20 1 bad Visible error, very annoying

TABLE I: Mapping of VMAF scores to MOS values

1) Real-time encoder cannot encode faster than real-time:

Even if the encoder was able to encode one frame at a time,at a faster-than-real-time speed, you still need to wait on thecapturer to provide the next frame before the encoder canprocess it.

2) Real-time encoder latency is correlated with bufferdepth:

Let’s take the use case where you have a 60 framesdeep frame buffer. With live content, you need to wait untilthose frames are generated before you can start any kindof processing. At a capture rate of 30 fps, one must wait2 seconds before the above buffer is ready to be used, even ifthe encoder can then encode faster than 30 fps.To be able to compare traditional codec (default settings),and real-time modes of certain codecs implementation underthe same condition, we need to make sure that the frames arebeing delivered to the encoder at real-time speed even if it ispre-recorded content.

D. Codecs Performance in Real-Time Use Cases

The run time is latency plus encoding time, and the latencyis a function of both the depth of any frame buffer and theconstant capture rate. Increasing the frame rate reduces latencybut increases the work load. The easiest way to reduce thelatency is to reduce the size of the frame buffer, or to removethe need for a frame buffer altogether.This is not to be confused with encoders speed setting. Mostrecent codecs design involves sub algorithms, referred to as“tools.” Some tools are more demanding in term of complexity,or latency than others, and not all have the same impact onefﬁciency. As described in [10] ﬁgure 1, the motion estimationis the dominant contributor to run time budget in encoding.The speed settings maps to the choice of some speciﬁcsubset deﬁning a certain trade off between coding efﬁciencyand encoding speed (possibly disregarding completely thelatency). Some other codecs also have an explicit real-timemode, settings that relate to coding/efﬁciency – latency tradeoff.This comes at a cost in terms of coding efﬁciency, andmakes the real-time mode of codecs not directly comparablewith rate-compression graphs. There is no study to date abouthow much coding efﬁciency you trade off for decreasedlatency.In this paper, we will concentrate on the real-time modeof some of the encoder implementations for H.264 (AVC),VP8, VP9, and AV1 which are all used in the webrtc.org codetoday. An implementation of H.265 (HEVC) will be also usedfor comparison. (a) Blue sky (BS25) (b) Pedestrian area (PA25)(c) Riverbed (RB25) (d) Rush hour (RH25)(e) Station2 (ST25) (f) Sunﬂower (SF25)(g) Tractor (TR25) (h) Crowd run (CR50)(i) Ducks take off (DT50) (j) In to tree (IT50)(k) Old town cross (OT50) (l) Park joy (PJ50)

Fig. 1: Snapshots of each video sequencesII. M

ETHODOLOGY

A. Quality Metric

Compression can be objectively evaluated. Quality of videoframes can also be evaluated objectively, however it has beenshown that scores provided by objective metrics like PNSR(Peak Signal to Noise Ratio) [11] poorly correlate with humanevaluation of image or video quality [12]. To address thisdrawback, subjective metrics like VMAF (Video MultimethodAssessment Fusion) [13], [14] were introduced. A recent ilename Short name fps Duration

TABLE II: List of 12 video sequences used in this studystudy on the evaluation of objective video quality metricshas demonstrated a good correlation between subjective scoresgiven by humans and VMAF scores [15]. We have followedthe latest trend in codec research and used VMAF in this study.To give an interpretation of a VMAF score, one can relateit to the typical Mean Opinion Score (MOS) value rangingfrom 1 to 5. A very common rating scale for MOS is theAbsolute Category Rating (ACR) methodology [16]: “bad,”“poor,” “fair,” “good” and “excellent.” VMAF gives a score inthe range [0 , . VMAF score 20 can be mapped to “bad,”score 40 to “poor,” score 60 to “fair,” score 80 to “good,”and score 100 to “excellent” [17]. Table I gives a syntheticrelationship between VMAF scores and MOS values. B. Datasets

For easier comparison of results, we have used videosequences having the same resolution, the same color spaceand the same bit depth. Only the duration or the frame rate ofvideo sequences is different from one video to another.We have focused on 1080p HD video. This is the resolutionrecommended to compute VMAF scores using the defaultmodel v0.6.1 [17]. Table II gives the list of the 12 videosused in our study.All the video sequences use YUV format, 8 bits depth andare not compressed. They have been selected from the pub-licly available Xiph.org Video Test Media [derf’s collection]dataset. There are two groups of videos: a group of 7 videos havinga frame rate of 25 fps, and a group of 5 videos with a framerate of 50 fps. Fig. 1 shows a snapshot of each video.

C. Video Codecs

We will compare the performance of eight encoders, namelyaomenc (default and real-time settings) and SvtAv1EncApp forAV1, vpxenc for VP8 and VP9, x265 for HEVC, x264 andh264enc for H.264, compiled in their real-time mode whenavailable, and using speed 8 when applicable, using variousbitrate targets.Table III gives for each codec all the information needed toreproduce the results: the version we have used, where to get Xiph.org Video Test Media [derf’s collection] https://media.xiph.org/video/derf/ their source code and which options we selected to compilethem.To give an insight of encoding performance differ-ence between real-time mode and non real-time mode,only for AV1, we have compiled a second versionof AOM encoder aomenc compiled using the option -DCONFIG_REALTIME_ONLY=0 . We have selected speedoption --cpu-used=3 .For the real-time version of aomenc, we have selectedthe highest speed option --cpu-used=8 . We will refer toaomenc in real-time mode as aomenc-rt (encoder called withoption --rt , and to aomenc not in real-time mode as aomenc-good (encoder called with good quality option --good ).We also selected the highest speed option --preset 8 for SVT-AV1.We selected the setting --preset superfast for x265,and the setting --preset medium for x264. In both case,it was enough to reach the max 50 fps target.The options used at run time to launch each codec are givenin Table IV.Compilation of encoders and encoding of videos have beenperformed on a Dell™ OptiPlex 5050 with processor Intel®Core™ i7-7700T 8 cores at 2.90 GHz and 16 GB memoryrunning Ubuntu Desktop 20.04.1 64 bits operating system.

D. Real-Time Encoding Evaluation Process

The evaluation of a video encoder is usually performed byletting the encoder to read a video ﬁle to be encoded. Thisis not a realistic mode of operation for real-time where theframes to be encoded are only available after some delay. Forexample, when encoding frames received from a camera at arate of 25 fps, each new frame is available only after a delayof 1/25 second, that is 40 ms. Even if the encoder is able toencode a frame in 5 ms, it has to wait until the next framebecomes available. We are starving the encoder.To evaluate the encoders in a more realistic process, weintroduce a pacer program between the video ﬁle and theencoder (see Fig. 2). The objective of the pacer is to deliverframes to the encoder only at a selected frame rate.A raw video ﬁle to be encoded is read by the pacer program.The pacer is in charge of simulating the real-time delivery ofimages at a selected rate. When the pacer reads frames froma 25 fps video ﬁle, it outputs a frame every 1/25th second. odec Encoder and version Source code Conﬁguration optionsAV1 libaom 2.0.0 https://aomedia.googlesource.com/aom/ cmake -DCMAKE_BUILD_TYPE=Release-DCONFIG_MULTITHREAD=1 -DCONFIG_PIC=1-DCONFIG_REALTIME_ONLY=1-DCONFIG_RUNTIME_CPU_DETECT=1-DCONFIG_WEBM_IO=0

AV1 SVT-AV1 0.8.4 https://github.com/OpenVisualCloud/SVT-AV1 build.sh release

VP8, VP9 libvpx 1.9.0 https://chromium.googlesource.com/webm/libvpx configure --enable-pic--enable-realtime-only--enable-multi-res-encoding--disable-debug --cpu=x86-64

H.265 x265 release 3.5 https://github.com/videolan/x265 cmake -DCMAKE_BUILD_TYPE=Release

H.264 x264 stable cde9a933 https://code.videolan.org/videolan/x264 make

H.264 openh264 2.1.1 https://github.com/cisco/openh264 make OS=linux ARCH=x86_64

TABLE III: Video encoders used in this study

Codec Encoder command optionsaomenc rt aomenc --codec=av1 --profile=0 --kf-max-dist=90000 --end-usage=cbr --min-q=1 --max-q=63--undershoot-pct=50 --overshoot-pct=50 --buf-sz=1000 --buf-initial-sz=500 --buf-optimal-sz=600--max-intra-rate=300 --passes=1 --rt --lag-in-frames=0 --error-resilient=0 --tile-columns=0--aq-mode=3 --enable-obmc=0 --enable-global-motion=0 --enable-warped-motion=0 --deltaq-mode=0--enable-tpl-model=0 --mode-cost-upd-freq=2 --coeff-cost-upd-freq=2 --enable-ref-frame-mvs=0--mv-cost-upd-freq=3 --enable-order-hint=0 --cpu-used=8 --threads=8 --end-usage=cbr--target-bitrate=xxx --fps=25/1 aomenc good aomenc --codec=av1 --good --passes=1 --cpu-used=3 --threads=8 --lag-in-frames=25 --min-q=0--max-q=63 --auto-alt-ref=1 --kf-max-dist=150 --kf-min-dist=0 --drop-frame=0 --static-thresh=0--arnr-maxframes=7 --arnr-strength=5 --sharpness=0 --undershoot-pct=100 --overshoot-pct=100--frame-parallel=0 --tile-columns=0 --profile=0 --target-bitrate=xxx --fps=25/1

SVT-AV1

SvtAv1EncApp --tbr xxx --fps 25 --preset 8 --pred-struct 0 --profile 0 --rc 2 --min-qp 1--max-qp 63 --vbv-bufsize 1 --tile-columns 0 --enable-global-motion 1 --enable-local-warp 0--adaptive-quantization 0

VP8 vpxenc --codec=vp8 --lag-in-frames=0 --error-resilient=0 --kf-max-dist=90000 --static-thresh=0--end-usage=cbr --undershoot-pct=50 --overshoot-pct=50 --buf-sz=1000 --buf-initial-sz=500--buf-optimal-sz=600 --max-intra-rate=300 --resize-allowed=0 --drop-frame=0 --passes=1 --rt--noise-sensitivity=0 --cpu-used=-6 --threads=8 --min-q=1 --max-q=63 --screen-content-mode=0--target-bitrate=xxx --fps=25/1

VP9 vpxenc --codec=vp9 --lag-in-frames=0 --error-resilient=0 --kf-max-dist=90000 --static-thresh=0--end-usage=cbr --undershoot-pct=50 --overshoot-pct=50 --buf-sz=1000 --buf-initial-sz=500--buf-optimal-sz=600 --max-intra-rate=300 --resize-allowed=0 --drop-frame=0 --passes=1--rt --noise-sensitivity=0 --cpu-used=7 --threads=8 --profile=0 --min-q=1 --max-q=63--tile-columns=0 --aq-mode=3 --target-bitrate=xxx --fps=25/1 x265 x265 --frame-threads 0 --preset superfast --bitrate xxx --fps 25 x264 x264 --preset medium --bitrate xxx --fps 25 --demuxer raw openh264 h264enc -rc 0 -complexity 2 -denois 0 -scene 0 -bgd 0 -fs 0 -numl 1 -tarb 0 xxx -frout 0 25

TABLE IV: Options used for encoders at run time (encoding has been performed at ten target bitrates: 800, 900, 1000, 1250,1500, 1750, 2000, 2500, 5000 and 10000 kbps)For a 50 fps video ﬁle, the pacer will output a frame every1/50th second. Frames delivered by the pacer at the selectedrate are written to a Unix pipe. The video encoder being testedreads frames from the Unix pipe. The output of the encoder isa bitstream corresponding to the encoded video ﬁle. At last,we evaluate the quality of the encoded video by using VMAFvideo quality assessment tool.III. R

ESULTS AND A NALYSIS

We measured the VMAF scores for each encoder at tendifferent target bitrates: 800, 900, 1000, 1250, 1500, 1750,2000, 2500, 5000 and 10000 kbps (see VMAF graphs inFigures 4a to 4l), and computed the BD-rates from the VMAFcurves according to bitrate (see Tables V and VI).

A. Pacer: Latency Impact of Real-Time Streams

We measured the encoding throughput ﬁrst without usingthe pacer (the encoder reads the frames directly from the video

Video openh264 x264 VP8 VP9 x265 SVTBS25 − . − . − . − .

49 7 .

63 3 . PA25 − . − . − . − .

91 16 .

10 28 . RB25 − . − . − . − . − .

97 17 . RH25 − . − . − . − .

93 16 .

22 18 . ST25 − . − . − . − .

50 40 .

99 33 . SF25 − . − . − . − .

69 34 .

49 34 . TR25 − . − . − . − .

16 20 .

84 31 . Avg 25 − . − . − . − .

24 19 .

33 24 . CR50 − . − . − . − . − .

65 8 . DT50 − . − . − . − .

33 22 .

54 39 . IT50 − . − . − . − .

31 7 .

52 19 . OT50 − . − . − . − .

95 12 .

56 22 . PJ50 − . − . − . − .

78 14 .

10 39 . Avg 50 − . − . − . − .

20 11 .

01 25 . TABLE V: BD-rate for aomenc-rt8 (Note: SVT = SVT-AV1)ﬁle), which, supposing the encoding speed is limited by theencoder but not by the I/O speed, provides the maximum speedat which the encoder can operate given the input content,ig. 2: Evaluation of video encoder in real-time mode

Video openh264 x264 VP8 VP9 x265 SVTBS25 .

61 1 .

36 3 .

90 0 . − . − . PA25 .

70 4 .

41 6 .

62 1 . − . − . RB25 .

88 14 .

91 13 .

17 1 .

21 0 . − . RH25 .

59 4 .

51 4 .

30 0 . − . − . ST25 .

63 0 .

61 3 .

98 1 . − . − . SF25 .

11 0 .

34 2 .

88 1 . − . − . TR25 .

35 3 .

75 4 .

14 0 . − . − . Avg 25 .

27 4 .

27 5 .

57 1 . − . − . CR50 .

60 8 .

06 12 .

78 3 .

94 0 . − . DT50 .

06 5 .

81 9 .

99 3 . − . − . IT50 .

77 5 .

02 4 .

85 1 . − . − . OT50 .

64 3 .

03 7 .

11 1 . − . − . PJ50 .

70 1 .

75 8 .

33 3 . − . − . Avg 50 .

55 4 .

73 8 .

61 2 . − . − . TABLE VI: BD-VMAF for aomenc-rt8 (Note: SVT = SVT-AV1)the bitrate target, and the hardware. We then measured thethroughput using the pacer.Fig. 3a and 3b show the encoding throughput achieved bythe encoders when they are not limited by the pacer. At thethreshold bitrate of 2500 kbps or less, all the encoders areable to deliver 25 fps or more on our test computer. exceptSVT-AV1 which at best reach only 5.3 fps.It is interesting to note that, when requested to encode fora rate of 50 fps instead of 25 fps, all the encoders perform theircompression faster, although three of the ﬁve videos in the50 fps group are rather difﬁcult to be encoded. This increasein compression speed allows x264 and x265 to reach 50 fps atthe threshold bitrate of 2500 kbps, while VP9 and aomenc-rtare not able to deliver more than 30 fps.Generally, the encoders work at a more homogeneous speedat higher bitrates as shown by the shorter standard deviationbars, while the measured frame rate varies much more forlower bitrates. In fact, at low bitrates, the encoders processvideos easy to encode (Blue-sky, Pedestrian-area, Rush-hour,Station2, Sunﬂower, Tractor, In-to-tree, Old-town0cross) muchfaster than videos harder to encode (Riverbed, Crowd-run,Ducks-take-off, Park-run). (a) Average encoding speed of theseven 25 fps videos without pacer (b) Average encoding speed of theﬁve 50 fps videos without pacer(c) Average encoding speed of theseven 25 fps videos with pacer (d) Average encoding speed of theﬁve 50 fps videos with pacer

Fig. 3: Average encoding speed of 1080p videos on a Dell™OptiPlex 5050 with processor Intel® Core™ i7-7700T 8 coresat 2.90 GHz and 16 GB memory running Ubuntu Desktop20.04.1 64 bits operating system.Apart from the settings, this is the conditions under whichmost codec studies are being done, and performances reported.One can see that in the case of x264 and x265, the latencyis the bottleneck of the throughput and not the encoding speedas it can encode 50 fps media content faster than 25 fps.Supposing a constant I/O speed, reading a frame (latency) isconstant whether the media content is originally captured at25 or 50 fps, however, the complexity (linear to the numberof pixels to encode) per second is twice as much.This is the opposite for libvpx, for which we can clearly seethat encoding 50 fps content is slower than encoding 25 fpswhether with VP8 or VP9.Introducing the pacer, we measured the frame rates as shownin Fig. 3c and 3d.All the encoders which could deliver more than 25 fps inthe previous experiment have no problem delivering with apaced input. SVT-AV1 remains way too slow for real-time.All the encoders which could deliver more than 50fps inthe previous experiment, like VP8 and x264 here again haveno problem. Interestingly x265 is performing extremely well.It is also interesting to note that aomenc-rt is slightly fasterthan VP9, while one would expect the opposite. Right nowwe can conclude that there is no CPU penalty when movingfrom using VP9 to using AV1 real-time.

B. Interpretation of BD-rate and BD-VMAF

A BD-rate is a measure of the average percentage bitratesavings that can be obtained for the same visual quality level.This measure is computed over the range of quality levels thatare common to two curves.or example, let consider VMAF scores on Fig. 4i forDucks-take-off (DT50) video. We want to compute the bitratesavings at same VMAF level of aomenc-rt8 (blue curve,VMAF range from 27 to 61) as compared to x264 (orangecurve, VMAF range from 6 to 55). The common VMAF rangefor these two curves is 27 to 55. Using that common qualityrange, we compute the average bitrate savings by calculatingthe area between the curves (to the left of the x264 orangecurve and to the right of the aomenc-rt8 blue curve), and wedivide it by the area to the left of the aomenc-rt8 blue curveup to the right of the y-axis. We get a BD-rate of − . as shown in Table V. It is a negative value, which meansthat there is a reduction in bitrate for aomenc-rt8 as comparedto x264. The interpretation of this value is that for the sameVMAF score, we may expect that aomenc-rt8 gives in averagea 21.68% bitrate savings as compared to x264.Similarly, we can compute the average visual quality im-provement for the same bitrate between aomenc-rt8 and x264by switching the variables. Using the same example, we lookfor the common bitrate range between aomenc-rt8 (blue curve,bitrate range from 2160 to 9925 kbps) and x264 (orangecurve, bitrate range from 810 to 10000 kbps). The commonbitrate range for these two curves is 2160 to 9925 kbps. Usingthat common bitrate range, we compute the average VMAFimprovement by calculating the area between the curves (tothe bottom of aomenc-rt8 blue curve and to the top of x264orange curve), and we divide it by the area to the bottom ofthe x264 orange curve down to the top of the x-axis. We geta BD-VMAF of . as shown in Table VI. It is a positivevalue, which means that there is an increase in VMAF scorefor aomenc-rt8 as compared to x264. The interpretation of thisvalue is that for the same bitrate, we may expect that aomenc-rt8 gives in average a VMAF score . points higher thanx264. C. Discussion

VMAF scores are reported in Fig. 4a to 4l. We havehighlighted a threshold of 2500 kbps for bitrate as thisvalue is known to be a hard-coded maximum for WebRTCwithin Chromium browser, although native applications usingWebRTC are not concerned by such a limitation.

1) 25 fps datasets:

VMAF scores show that all the videosof the 25 fps group except Riverbed are relatively easy toencode. The curves are close or very close to the perfect scoreof 100 for most bitrates. It is only at low bitrate of roughly1000 kbps or lower that VMAF score gets lower than 80.The quality of videos encoded by openh264 is worsening athigher bitrates and faster than for the other encoders. VP8 isalso showing a slightly lower quality of videos than the otherencoders. At the threshold of 2500 kbps bitrate, the qualityof encoded videos is excellent or good, above VMAF scoreof 80.Riverbed is the only video of the 25 fps group being difﬁcultto be encoded with good quality. Although this video is static,waves on surface of water and reﬂection of light on the wavesare challenging for encoders. Openh264 and VP8 are not able to encode this video at the threshold bitrate of 2500 kbps.The lowest bitrate delivered by openh264 for Riverbed is3760 kbps, while it is 3460 kbps for VP8. VMAF rating atthe threshold bitrate is only between 30 and 50, which is poorquality.

2) 50 fps datasets:

The group of 50 fps videos was moredifﬁcult to encode with good quality than the group of 25 fpsvideos. Ducks-take-off was the most challenging video tobe encoded. Interestingly, like Riverbed, it is a static videoshowing water with waves.At the threshold bitrate of 2500 kbps, VMAf score is lowerthan 80 for the videos in the group of 50 fps, except for videoOld-town-cross which has a very slow motion.We notice again that openh264 provides noticeable lowerVMAF scores than the other encoders. There are three videos,Crowd-run, Ducks-take-off and Park-joy, which encodersopenh264 and VP8 are unable to encode at the threshold bitrateof 2500 kbps. They would require a target bitrate of 6000 kbpsor more.The other encoders, aomenc-rt8, SVT-AV1, x265, x264 andVP9, are able to encode all the twelve videos with a rathersimilar quality. The quality of videos encoded by SVT-AV1 isalways the best, x265 comes second and aomenc-rt8 is third.From Table VI giving BD-VMAF for aomenc-rt8, we canget the ranking of the encoders relatively to aomenc-rt8. Wecan see that for the same bitrate, VMAF score of SVT-AV1 > x265 > aomenc-rt8 > VP9 > x264 > VP8 > openh264.Among the two encoders studied for AV1, SVT-AV1 pro-vides a much better coding efﬁciency than aomenc-rt real-time.However, SVT-AV1 is lacking an efﬁcient real-time mode. Ittook SVT-AV1 about 7 times as long as aomenc-rt to encodethe same video clips, although both were run using the samespeed of 8. IV. CONCLUSION

A. Latency

Pre-recorded content provide encoders with the capacityto ﬁll up buffers to increase the coding efﬁciency withoutincreasing the latency too much. The same buffers which areﬁlled at I/O speed for pre-recorded content need to wait forframe to be acquired in live, real-time and interactive use case,making any operation that requires frame buffers prohibitive.Very often, benchmarks are only provided for pre-recordedcontent, and cannot be directly translated into the real-timeconﬁguration. One of the contribution of this paper is a processto compute fair performance comparison of encoders in areal-time situation, while still using standard ﬁles as input.We think it’s going to help extend existing test beds to beable to better assess performances of all codecs in what hasbecome a much more important use case.

B. Coding Efﬁciency

It is outside the scope of this paper, but it is interesting tonote that SVT-AV1 at speed 8 has a coding efﬁciency similaror better than aomenc-good at speed 3. a) Blue sky (25 fps) BS25 (b) Pedestrian area (25 fps) PA25 (c) Riverbed (25 fps) RB25(d) Rush hour (25 fps) RH25 (e) Station2 (25 fps) ST25 (f) Sunﬂower (25 fps) SF25(g) Tractor (25 fps) TR25 (h) Crowd run (50 fps) CR50 (i) Ducks take off (50 fps) DT50(j) In to tree (50 fps) IT50 (k) Old town cross (50 fps) OT50 (l) Park joy (50 fps) PJ50

Fig. 4: VMAF scores according to bitratehe non-realtime version of aomenc has a better codingefﬁciency than the real-time version. The real-time version ofAOM seems to be about 33% less efﬁcient. While theoreticallyinteresting, this has but little practical interest, since the defaultaomenc encoder, even at its maximum setting of speed 6 willhave too much latency to be used in interactive case. Theinteresting question is: what do I gain or lose when switchingfrom one real-time codec to another, and which ones canachieve the lowest latency.We ﬁnd that the real-time modes of the codecs are generallyperforming relatively to each other as they would with theirnon real-time version, i.e AV1 better than VP9 better thanVP8, and HEVC better than AVC. The exception is HEVCwhich has a better coding efﬁciency than aomenc-rt real-time,while in the non-real time mode, it is aomenc that is reportedto have a better coding efﬁcency than HEVC [9].x265 exhibited excellent coding efﬁciency. It is able toencode 1080p videos in real-time at 50 fps. It looks like itsspeed settings has also very good (low) latency. The authorsregret that the licensing situation of HEVC is still complicated.We have shown that one can expect an average of 11% lessbandwidth usage for the same video quality with aomenc-rtthan with VP9-rt, 37% less than with VP8.The x264 implementation of H.264 is more or less often inpar with VP8 for quality, while the openh264 implementationof H.264 exhibits generally lower coding efﬁciency on the12 video clips of this study. This illustrates that encoderimplementations of the same codec can vary a lot in quality,and one should compare implementations and not codecsdirectly.Codec implementations improve with time, so it is likelythat the sitll young implementation of AV1 codecs will im-prove with time. V. F

UTURE W ORK

This is the ﬁrst attempt at comparing several real-time ver-sions of encoders, so there is a lot of room for improvement.One obvious way to improve the results is to test on moredatasets, which could e.g. include different types of content(high-motion videos, animation, live video games), differentresolutions (480p a.k.a. DVD size, 4k, 8k). We think there isstill a lot of work to be done to directly measure latency ofencoders.There are many more implementation of codecs out there,and testing different implementations, including hardware im-plementation would add signiﬁcant value for readers.A

CKNOWLEDGEMENT

The authors would like to thank Andrew Johnson, CTO ofNira Inc., Ioannis Katsavounidis from Netﬂix, and Googlecollaborators (Marco Paniconi, Fyodor Kyslov, MichaelHorowitz, Yaowu Xu, Jerome Jiang, Danil Chapovalov, ...)for their valuable help, suggestions and comments. R

EFERENCES[1] J. De Cock, A. Mavlankar, A. Moorthy, and A. Aaron, “A large-scalevideo codec comparison of x264, x265 and libvpx for practical VODapplications,” in

Applications of Digital Image Processing XXXIX , vol.SPIE 9971, September 2016.[2] P. Akyazi and T. Ebrahimi, “Comparison of compression efﬁciencybetween HEVC/H.265, VP9 and AV1 based on subjective qualityassessments,” in

International Conference on Quality of MultimediaExperience (QoMEX)

APSIPA Transactionson Signal and Information Processing , vol. 9, February 2020.[10] K. Yu, J. Lu, J. Li, and S. Li, “Practical real-time video codec for mobiledevices,” in

International Conference on Multimedia and Expo (ICME) .IEEE, July 2003, pp. 509–512.[11] American National Standards Institute,

Objective video quality measure-ment using a peak-signal-to-noise-ratio (PSNR) full reference technique ,American National Standards Institute, Ad Hoc Group on Video QualityMetrics, 2001.[12] H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluationof recent full reference image quality assessment algorithms,”

IEEETransactions on Image Processing , vol. 15, pp. 3440–3451, November2006.[13] Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara,“Toward a practical perceptual video quality metric,” Tech.Rep., June 2016. [Online]. Available: https://netﬂixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652[14] Netﬂix. VMAF – Video Multi-Method Assessment Fusion. [Online].Available: https://github.com/Netﬂix/vmaf[15] C. Lee, S. Woo, S. Baek, J. Han, J. Chae, and J. Rim, “Comparison ofobjective quality models for adaptive bit-streaming services,” in