Comparing H.265/HEVC and VP9: Impact of High Frame Rates on the Perceptual Quality of Compressed Videos
PPRE-PRINT SUBMITTED TO ARXIV.ORG 1
Comparing H.265/HEVC and VP9: Impact of HighFrame Rates on the Perceptual Quality ofCompressed Videos
Tariq Rahim,
Member, IEEE, and Muhammad Arslan Usman,
Member, IEEE, and Soo Young Shin,
Senior Member , IEEE
Abstract —High frame rates have been known to enhance theperceived visual quality of specific video content. However, thelack of investigation of high frame rates has restricted theexpansion of this research field—particularly in the contextof full-high-definition (FHD) and 4K ultra-high-definition videoformats. This study involves a subjective and objective qualityassessment of compressed FHD videos. First, we compress theFHD videos by employing high-efficiency video coding, and VP9at five quantization parameter levels for multiple frame rates,i.e. 15fps, 30fps, and 60fps. The FHD videos are obtained froma high frame-rate video database BVI-HFR, spanning variousscenes, colors, and motions, and are shown to be representative ofthe BBC broadcast content. Second, a detailed subjective qualityassessment of compressed videos for both encoders and individualframe rates is conducted, resulting in subjective measurementsin the form of the differential mean opinion score reflecting thequality of experience. In particular, the aim is to investigate theimpact of compression on the perceptual quality of compressedFHD videos and compare the performance of both encodersfor each frame rate. Finally, 11 state-of-the-art objective qualityassessment metrics are benchmarked using the subjective mea-surements, to investigate the correlation as a statistical evaluationbetween the two models in terms of correlation coefficients. Arecommendation for enhancing the quality estimation of full-reference (FR) video quality measurements (VQMs) is presentedafter the extensive investigation.
Index Terms —Differential mean opinion score (DMOS), fullhigh definition (FHD), quantization parameter (QP), subjectivequality assessment, objective quality assessment.
I. I
NTRODUCTION W ITH the progress in technology for capturing, stor-ing, transmitting, and displaying video content, high-quality video services have recently become prevalent. Nowa-days, video content at high-definition (HD) resolutions areprovided by most broadcasting companies and online videoweb sites. Following the success of HD video services,the 4K ultra-high-definition (UHD) resolution format with × pixels is regarded as the future standard in videoapplications [1]. Recently, there has been an increased focus onthe implementation of high-spatial-resoultion (4K/8K), high-dynamic-range (HDR), and immersive multi-view formats. T. Rahim and S. Y. Shin are with the WENS Laboratory, Department ofIT Convergence Engineering, Kumoh National Institute of Technology, Gumi39177, South Korea (e-mail: [email protected]; [email protected]).Muhammad Arslan Usman is with working jointly with Kingston UniversityLondon and Pangea Connected London under the umbrella of KnowledgeTransfer Partnerships (KTP), United Kingdom.Manuscript received
Still, the progress related to high-frame-rate formats has beenrelatively slow, as evident from the frame rates for entertain-ment videos, such as cinematic films and TV programs, whoseresolution seldom surpasses 60fps [2].Ideally, full-high-definition (FHD) content is expected toprovide viewers with improved visual experience througha wide field of view both horizontally and vertically, withsuitable screen sizes. FHD with × pixels has twotimes the spatial resolution than HD Ready, and thus, candeliver a larger amount of visual information to viewers. Thisincrease in resolution is the initial stage of an immersive andnaturalistic visual experience [3].Recently, high frame rates (HFRs) have stimulated interestin communities such as broadcast, film (Avatar, Billy Lynn’sLong Halftime Walk), online streaming, virtual reality, andgaming. For the ultra-high-definition video standard (Rec.2020), up to 120fps have been specified [4]. However, theneed for higher-resolution and HFR videos is growing becauseof the availability of 4K and 8K UHD contents and largerdisplay screens. Although frame interpolation and differentpost-processing methods can alleviate the artifacts found inlow frame rates, satisfactory results have not been obtained.Because of many dynamics in a sequence, human viewers donot require a fixed HFR for a full video. For instance, lowdynamics can be achieved for a video with lower coding framerate [5].Nevertheless, the FHD format produces a new challenge;that is, managing the increased amount of data in FHD videoservices needs more storage capacity and bandwidth. To ad-dress this problem, video compression with essential measureis necessary. The High-Efficiency Video Coding (HEVC) [6]is the latest standard for video compression, developed bythe Joint Collaborative Team on Video Coding, which wasfounded by both the ITU-T Video Coding Experts Groupand the ISO/IEC Moving Pictures Expert Group in 2010. In2013, the final HEVC specification was approved by ITU-T asRecommendation H.265 and by ISO/IEC as MPEG-H, Part 2.Part of the WebM project, VP9 [7] is another open-sourcecompression method introduced recently. For any encoder,the compression level and variation in the frame rate havea direct relation with the quality of video content. Hence, itis necessary to investigate the impact of compression level onthe perceptual quality of users for different frame rates usingdifferent encoders.Before the exploitation of HFR for future video formats, a r X i v : . [ ee ss . I V ] J un RE-PRINT SUBMITTED TO ARXIV.ORG 2 further research is required to identify the contribution madeby the different frame rates for the entire video pipeline, i.e.,from obtaining the video through compression till transmissionfor visual perception. Various attempts have been made to de-termine the relation between frame rate and perceived quality,such as examination of motion blurring perception [8] at aframe rate of 30Hz. Similarly, a subjective test was conductedby utilizing QCIF and CIF for frame rates below 30fps, toinvestigate the impact of frame rate and quantization parameteron the perceptual quality of video [9]. The exploitation offrame rates above 30fps are quite rare, and some studies havebeen reported [2], [10], and [11] to be concentrating on eitherhigh-resolution videos, such as 4K, or a gaming environment.Meanwhile, very few HFR databases have been publiclyreleased thus limiting the research, which also implies thatrobust inferences regarding HFR are difficult. The availabledatabases are mostly based on either a single frame rate orlow frame rates. To cope them, a publicly available HFRvideo database Bristol Vision Institute High Frame Rate (BVI-HFR) [10] containing 22 diverse uncompressed FHD videosequences with a resolution of × , each with 10 secduration, and source videos of 120fps is utilized.In the context of this paper, we utilize the BVI-HFRdatabase to investigate the impact of compression on percep-tual quality by encoding the video contents with H.265/HEVCand VP9 encoders using five different QP levels. A detailedsubjective quality assessment for the compressed videos forboth encoders and individual frame rates is conducted, re-sulting in subjective measurements in the form of DMOSreflecting the quality of experience. In particular, the aimis to investigate and compare the impact of compressionon the perceptual quality of compressed FHD videos. Then11 state-of-the-art objective quality assessment metrics arebenchmarked using the subjective measurements, to investigatethe correlation as a statistical evaluation between the twomodels in terms of correlation coefficients. This study aimsto investigate the performance of the opted encoders withdifferent frame rates under different QP levels. The BVI-HFR video database has a native video of 120 fps, which istemporally down-sampled by averaging to 60fps, 30fps, and15fps [12]. We use the video contents of frame rates 15fps,30fps, and 60fps for our investigation.The major contributions of this study are outlined as fol-lows: • First, compression of FHD video contents usingH.265/HEVC and VP9 at five QP levels 27, 31, 35,39, and 43 at frame rates of 15fps, 30fps, and 60fps isconducted. The compression of the FHD video contentis separately performed for each frame rate for bothencoders. • Second, a detailed subjective quality assessment for thecompressed video contents at 15fps, 30fps, and 60fpsis conducted to generate DMOS values reflecting theperceptual quality of the users. • Employing 11 state-of-the-art FR objective-VQA metrics,we attempt to quantify the relation between subjectivemeasurements i.e. DMOS, and FR-VQA metrics. The aimis to investigate the impact of frame rate variations on the perceptual quality and performance comparison of theopted encoders at different QP levels. After conductingstatistical evaluation to validate both models in terms ofcorrelation coefficients (cc), FR-VQA metrics for bothH.265/HEVC and VP9 is recommended for compressedFHD contents. • Finally, a recommendation for enhancing the quality esti-mation of full-reference (FR) video quality measurements(VQMs) is presented after the extensive investigation.II. RELATED WORK
A. Advantages of Increased Frame Rate
Previous research has shown that there are several distinctadvantages associated with increased frame rates: enhanceddepth perception for both non-expert [13] and expert [14]viewers; improved realism; more constant motion; decreasein perceptible motion blurring [15]; diminishing of temporalaliasing artifacts visibility [16] for up to 240fps, perceptualquality improvement [15]; enhancement in spatial and speeddiscrimination [17]; higher realistic picture quality [18]; andreduction in stress levels for the viewer (implied by a lowblinking frequency [19]). HFR also improves the capability ofcapturing slow-motion playback videos [20]. An experimentalsetup is shown in [2] for fully eradicating artifacts of temporalaliasing in some scenarios at frame rates are close to 900fps.However, despite these advantages, HFR contents maybarely be desirable in representing a “hyper-realistic” scene(e.g., sports programming), as lower frame rates may cause aconflict with the “cinematic appearance”. Content and directorproviders currently have limited compliance in this matter (asin legacy formats, frame rates have for several years remainedstatic), and consequently, the selection of frame rates - enabledby the application of temporal down-sampling techniques canbe regarded as an artistic option [2].
B. Video Databases - High Frame Rates (HFR)
Very few HFRs of FHD video databases are publiclyavailable [2]. Previous studies used either single frame ratesor comparatively low frame rates, i.e, (24fps [21], 30fps [1],[5], [22]–[25]. In contrast, few studies have focused on framerates above 50fps [24], and 60fps [5].
C. Video Compression and Configuration
Frame-rate-related video quality and compression for dif-ferent QP values have been studied for almost two decades.These efforts can be roughly categorized into two main classesbased on their goals. The first class is concerned with variousviewing positions and artifacts perceived by viewing the videoaired at different frame rates. The second one concentrateson efficient video compression techniques for decreasing thecoding bit rates with little quality degradation [5].To investigate the impact of frame rate and QP on perceptualquality of a video, the product of spatial quality factor (SQF)and a temporal correction factor (TCF) is utilized [9]. SQFestimates the decoded frames quality while TCF decreases thequality defined by the initial factor according to the original
RE-PRINT SUBMITTED TO ARXIV.ORG 3
Fig. 1: A sample frame from each of the 22 video sequences in the BVI-HFR video database, along with the names andassociated indices.frames rate. A high correlation is achieved between subjectiveassessment and the proposed content-dependent metric [9].Subjective analysis is conducted to evaluate the effect offrame rate and H.264 compression on the perceived videoquality. A direct relation between the frame rate and videoquality is shown; yet the dependencies on QP level, spatialresolution, and video content statistics are important [11].A rate-distortion optimization that adaptively determines QPsfor a group of neighboring frames, mostly implemented inH.265/HEVC for decreasing the coding distortion, resulted insufficient minimization of the BD-rate and quality fluctuation[26].Besides video compression, there has also been considerablerecent progress with respect to different codec comparisons.An objective analysis for evaluating the performance compar-ison of H.264/MPEG-AVC, H.265/MPEG-HEVC, and VP9video encoders utilizing gaming videos for live streamingapplications was conducted in [21]. A subjective and objectiveassessment for UHD and TV broadcast situations [27] wasconducted to investigate the coding efficiency of HEVC,AVC, and VP9, indicating that HEVC outperforms the otherencoders. For UHD, FHD, and HD videos [28] the coding ef-ficiency and quality at the end-user were examined for codecssuch as H.264, H.265, VP8, and VP9. Similarly, a detailedsubjective and objective analysis [29] was conducted for real-time applications using HEVC and VP9 encoders showingbetter performance of HEVC in terms of compression.III. V
IDEO D ATABASE AND E NCODING C ONFIGURATION
A HFR, 120fps from the BVI-HFR video database [10] wasused for comparing the performance of H.265/HEVC and VP9encoders to investigate the impact of HFR on the perceptualquality of compressed videos.
A. Video Database
The BVI-HFR video database [10] comprises 22 unique un-compressed video sequences at FHD resolution ( × ),with duration of 10 sec, and frame rate of 120fps. Eachvideo sequence was further been temporally down-sampledby averaging frames to 60fps, 30fps, and 15fps - resulting ina total of 88 sequences. Fig. 1 depicts the sample frame of thedatabase with associated name and index open to downloadfrom the link: https://vilab.blogs.ilrt.org/?p=1563. B. Content Description
The encoding complexity of a video sequence and thecompression difficulty are based on the complexity of thecontent, which is defined by employing Spatial Information(SI) and Temporal Information (TI). SI measures the amountof edge energy that can be used to measure the spatial details,while TI predicts the magnitude of temporal changes. Fig. 2presents the spatial and temporal contents of the videos forthe BVI-HFR database. All 22 source sequences at a framerate of 120fps are used to measure the SI and TI descriptionsfor the BVI-HFR database. Based on the SI and TI results,five video sequences from the database were selected forthe assessment [2]. From the database, the temporally down-sampled frame rates of 60fps, 30fps, and 15fps were optedfor investigating the impact of compression by encoding thevideos with H.265/HEVC and VP9 encoders under differentQP values.
C. Video Compression and Encoder Settings
For our analysis, temporally down-sampled frame rates of60fps, 30fps, and 15fps were selected from the BVI-HFRdatabase, resulting in a total 66 video sequences. All videosequences were FHD with a resolution of × in the RE-PRINT SUBMITTED TO ARXIV.ORG 4
TABLE I: Encoders settings and configuration for different QP levels
Codec Version ParametersH.265/HEVC libx265 v2.7.0 ffmpeg -i (INPUT) -c:v libx265 -x265-params pass=1 -strict experimental -b:v 8000k-minrate 800k -maxrate 8000k -pix fmt yuv 422p medium -an -f mp4 /dev/nullffmpeg -i (INPUT) -c:v libx265 -x265-params passs=2 qp=(27 to 43) -c:a aac -strict experimental -b:v 8000k-minrate 800k -maxrate 8000k -pix fmt yuv 422p medium -an output.mp4
VP9/WebM libvpx v1.7.0 ffmpeg -y -i (INPUT) -c:v libvpx-vp9 -b:v 8000k -pass 1 -c:a opus -b:a 64k -f webm /dev/nullffmpeg -i (INPUT Pass1) -c:v libvpx-vp9 -b:v 8000k -pass 2 -c:a opus -b:a -qmin 24 -qmax 26 (%For 27)-f webm output.webm eight-bit YUV 4:2:0 format. The resulting 66 video sequencesfor each frame rate were then encoded using HEVC andVP9 at five QPs: 27, 31, 35, 39, and 43. For example, thevideo sequences at frame rate 60fps, five videos were selectedbased on the SI and TI plotting as shown in Fig. 2 andwere encoded by HEVC at five QP levels, resulting in 25encoded sequences. The same process for frame rate 30fpsand 60fps resulting in a total of 75 encoded sequences. Thesame encoding process was used for VP9 resulting in a total75 encoded video sequences. Therefore, 150 encoded videosequences are achieved employing both encoders. TI S I water s plashing-120fps-360-1920x1080.avicatch t rack-120fps-360-1920x1080.avijoggers-120fps-360-1920x1080.avileaves w all-120fps-360-1920x1080.avilibrary-120fps-360-1920x1080.avimartial a rts-120fps-360-1920x1080.aviplasma-120fps-360-1920x1080.avipond-120fps-360-1920x1080.aviwater r ipples-120fps-360-1920x1080.avityping-120fps-360-1920x1080.avisparkler-120fps-360-1920x1080.avibobblehead-120fps-360-1920x1080.avibooks-120fps-360-1920x1080.avibouncyball-120fps-360-1920x1080.avicatch-120fps-360-1920x1080.avicyclist-120fps-360-1920x1080.aviflowers-120fps-360-1920x1080.avigolf s ide-120fps-360-1920x1080.aviguitar f ocus-120fps-360-1920x1080.avihamster-120fps-360-1920x1080.avilamppost-120fps-360-1920x1080.avipour-120fps-360-1920x1080.avi Fig. 2: Spatial-Temporal plot for BVI-HFR databaseIn our study, the FFmpeg open-source libraries, libx265and libvpx-vp9, were used as the encoder wrapper forH.265/HEVC and VP9 codec, respectively. The details of theencoder settings are listed in Table. I. Here, for balancing bothencoder configurations, modifications such as, the selection ofa preset instead of ultrafast and veryfast , the medium presetwere performed for the same encoding efficiency and qualitybut at the cost of compression efficiency.IV. S
UBJECTIVE E VALUATION
This section details the experimental analysis conductedfor the quantification to find the relation between QP andperceptual quality.
A. Video Contents
This section explains the techniques and materials used forconducting the subjective tests for the specific investigation. The participants for the particular research were selected basedon ITU recommendations, such as ITU-R BT.500-13 [30]and ITU-T P.910 [31]. Five video sequences were selectedafter analyzing the information generated from the SI andTI plotting shown in Fig. 2. From the BVI-HFR databasetemporally down-sampled frame rates of 60fps, 30fps, and15fps, each containing 22 video sequences were selected andencoded via H.265/HEVC and VP9 at the five QP levelsdiscussed in section III.In the rest of the paper, for the ease of usage the terms willbe defined as: • Source sequence (SRC) : An original or unim-paired/uncompressed video sequence. • Encoded video sequence (EVS) : An encoded/compressedor impaired video sequence. • Clip : Can be any video sequence i.e., either SRC or EVS.TABLE II: Opinion score rating scale
Category Rating OpinionScore NormalizedScoresVisual Quality Error Perceptibility
Excellent Imperceptible 5 80-100Good Perceptible but notannoying 4 60-80Fair Slightly annoying 3 40-60Poor Annoying 2 20-40Bad Very annoying 1 0-20
B. Experimental Setup and Approach
For the subjective assessment, a lab was specifically de-signed that contained only materials relevant for the testsbased on ITU-T P.910 recommendations [31]. A calibratedNewsync X24C LCD monitor with a spatial resolution of × (24inches), peak luminance of 300cd / m ,refresh rate of 144 Hz, and static contrast ratio of 1,000:1was employed for a specific experiment. A wireless mousewas accompanied with the LCD monitor. A high-performancesystem equipped with the subjective video quality assessment(VQA) software provided by Moscow State University (MSU)[32], was utilized in the test. This software is freely accessiblefor research and educational purposes. The viewing distancebetween the participant and the LCD monitor was maintainedat 76cm according to ITU-T P.910 recommendations [31]. RE-PRINT SUBMITTED TO ARXIV.ORG 5
Before the experiment, a training session was conducted forthe adoption of each participant with the testing process. Thetesting process comprised sessions such as the methodologyof testing, during answering any type of concerns or queries.The participants were shown two video sequences of thesame specifications but not from the BVI-HFR database,and subjective scores were recorded via the below discussedapproach. A total of 18 participants with an average of ( ± σ )24.6 ± C. Opinion Scores Analysis
For evaluating the clip quality using the DSCQS Type-II method, a continuous rating value in terms of opinionscores of 0-100 (from imperceptible to very annoying) wasrecorded, as shown in Table. II. The DSCQS Type-II method[30] recorded the opinion scores on a five point vertical scalerecommended by ITU-R BT 500-12 for each participant, eachframe rate i.e., 15fps, 30fps, and 60fps, and each encoderi.e H.265/HEVC and VP9. Each participant was shown twoclips simultaneously: an SRC and an EVS clip; however,the participants were unaware of the clip type. After theparticipants viewed the clips, they were asked to rate the scoreseparately as the opinion score (OS).
D. Scoring Method
As explained earlier, in this study, we have applied theDSCQS Type-II method for subjective evaluation. The OSrecorded on the five-point rating scale were transformed toa normalized scale that ranged between 0 and 100. DMOSare typically used for investigation, and calculated from meanopinion score (MOS) as follows:
DM OS i th = M OS
SRCith − M OS
EV Sith (1)where
M OS
SRCith and
M OS
EV Sith are the calculated MOSsof the source and encoded video sequences, respectively.MOSs for each frame rate and clip were calculated as OSby each participant, and can be expressed in the generalizedform as follows:
M OS i th = 1 N N (cid:88) i =1 OS vthith (2)where N is the total number of participants involved inthe subjective testing, and OS ith is the recorded normalizedopinion score for all i = 1,2,3.... N test subjects for the vth clip.Fig. 3 and Fig. 4 details the subjective test conducted by 18participants for the five selected clips at three frame rates, i.e.,15fps, 30fps, and 60fps, encoded with H.265/HEVC and VP9at five QP (27,31,35,39, and 43) levels in the form of DMOSvs. QP. V. F ULL -R EFERENCE V IDEO Q UALITY A SSESSMENT (VQA)
METRICS
This section briefly explains the full-reference (FR)-objective VQA metrics used in our analysis for different framerates.
A. Peak Signal-to-Noise Ratio
The peak signal-to-noise ratio (PSNR) model is a statisticalmeasurement-based model that calculates the mean squareerror (MSE) for each pixel of the frame for a clip [33]. Then,the resultant MSE is used as a noise to calculate the signal-to-noise ratio. Mathematically, MSE and PSNR can be derivedas follow:
M SE = 1
M N M − (cid:88) i =0 N − (cid:88) j =0 [ I ( i, j ) − K ( i, j )] (3) P SN R = 20 .log (cid:18) M AX I √ M SE (cid:19) (4)where I represent the source frame with dimensions of M × N and error approximation of the source frame as K . B. Structural Similarity Index Metric
The structural similarity index matrix (SSIM) computesthe clip quality based on the structural similarity (lumi-nance,contrast, and structural comparison) between the sourceand distorted clip [34]. SSIM can be calculated as follows:
SSIM = (2 µ x µ y + C ) (2 σ xy + C ) (cid:0) µ x + µ y + C (cid:1) (cid:0) σ x + σ y + C (cid:1) (5)where x and y are the distorted and source frames from theclips, respectively. C. Multi-scale SSIM Index Metric
The multi-scale SSIM index metric is an improved form ofthe SSIM metric designed to compute the quality of a frameof clip on multiple scales [35]. The scale has a highest scaleas M , and a lowest scale used for measuring luminance, whilethe contrast and structural comparison are measured on the j scale. D. Visual Signal-to-Noise Ratio
Based on the near-threshold and suprathreshold featuresof human vision, the visual signal-to-noise ratio (VSNR)quantifies the visual fidelity within the frames of a clip. Avisual fidelity greater than the thresholds is mapped to describethe clip quality of the clip [36].
RE-PRINT SUBMITTED TO ARXIV.ORG 6
E. Information Fidelity Criterion
The information fidelity criterion (IFC) based method is anatural scene statistics (NSS) method in which the transforma-tion of the source clip to the wavelet domain is performed byextracting the information based on NSS [37]. The same pro-cedure is followed for the distorted clip. Then, a single modelfor evaluating the visual quality of the clip is obtained byintegrating the two extracted quantities. IFC can be computedas follows:
IF C = (cid:88) k ∈ subbands I (cid:0) C N k ,k ; D N k ,k | s N k ,k (cid:1) (6)where, C N k ,k represents the N k coefficients of the k th sub-band from the random field (RF) [37]. F. Visual Information Fidelity
The visual information fidelity (VIF) quantifies and ex-tracts certain information by transforming each frame of bothdistorted and source clips into the wavelet domain. Both ofthese are based on the human visual system, and integrated tocompute the distorted frame’s visual quality [38].
V IF = (cid:80) j ∈ subbands I (cid:16) (cid:126)C N,j ; (cid:126)F N,j | s N,j (cid:17)(cid:80) j ∈ subbands I (cid:16) (cid:126)C N,j ; (cid:126)E N,j | s N,j (cid:17) (7)where the sub-bands of interest are represented as (cid:80) ; E and F represent the source and distorted frames, respectively;and (cid:126)C N,j describes the N elements of the random field C j ,representing the sub-band coefficients [38]. G. Pixel-based VIF
The pixel-based VIF is a lesser complex version of the VIFVQA metric, which extracts information on the pixel levels ofthe frame from both the source clip and distorted clip [38].
H. Universal quality index
This VQA metric measures the structural distortions in aclip, and then maps these measurements into a model forpredicting the visual quality of the clip. In the UQI metric,the quality Q can be calculated as follows: Q = 4 σ xy µ x µ y (cid:0) σ x + σ y (cid:1) (cid:0) µ x + µ y (cid:1) (8)where x and y are the source and distorted frames from theclips, respectively. I. Noise Quality Measure
This VQA metric considers the deviation in the localluminance mean, contrast sensitivity, and contrast measures ofthe clip [39]. The noise quality measure (NQM) is a weightedSNR measure between the source and encoded clips, and canbe calculated as follows:
N QM = 10 log (cid:32) (cid:80) x (cid:80) y O s ( x, y ) (cid:80) x (cid:80) y ( O s ( x, y ) − I s ( x, y )) (cid:33) (9)where, O s ( x, y ) and I s ( x, y ) describes the simulated ver-sions of the restored and source frame, respectively [39]. J. Weighted Signal-to-Noise Ratio
The VQA metric utilizes the contrast sensitivity func-tion (CSF) that describes the weighted signal-to-noise ratio(WSNR) as a ratio of the averaged weighted signal power tothe averaged weighted noise power given on a scale of dB.
K. Video Multimethod Assessment Fusion
The video multimethod assessment fusion (VMAF) is an FRmetric that evaluates the quality of a clip based on artifactssuch as compression and scaling. The VMAF employs currentimage quality metrics, such as VIF, Detail Loss Metric, MeanCo-Located Pixel Difference, and anti-noise signal-to-noiseratio for predicting the clip quality [40].The FR VQA metrics explained in this section are avail-able freely online and more detail for their mathematicalunderstandings can be found in [33] and each respectivereference. In this study, we simulated these FR metrics usingthe recommended simulation parameters obtained from eachcorresponding reference.VI. R
ESULTS AND D ISCUSSION
A. Subjective Quality Assessment Measurements
This section describes in detail the subjective measurementfor 15fps, 30fps, and 60fps clips using H.265/HEVC and VP9encoders. For each frame rate, the subjective measurementsare given as DMOS values against different QP levels for bothencoders.
1) Subjective Test Results for H.265/HEVC:
For the subjec-tive quality assessment, five clips (Books, Catchtrack, Ham-ster, Sparkler, and Watersplashing) are chosen based on the SIand TI plotting shown in Fig. 2. These five videos are encodedusing an HEVC encoder, as explained in section III. A detailedsubjective quality assessment for all three frame rates (15fps,30fps, and 60fps) using the subjective perceptual video qualitytool provided by MSU is conducted [32].The subjective test comprised of showing the selected par-ticipants SRCs and EVSs, but belonging to the same categoryfor example (Books). The subjective assessment is conductedfor each frame rate, i.e., 15fps, 30fps, and 60fps, resultingin a total of 75 EVSs. Then the participants are asked torecord their OS using DSCQS type-II, where the DMOS valuesby each participant, clip, and at each QP level are obtainedusing (1) and (2), explained in section IV. Fig. 3 shows thesubjective measurements as DMOS values for all three framerates using the HEVC encoder at each QP level. It can beseen that for all five EVSs, the DMOS values change withthe increasing QP level. Higher QP levels tend to deterioratethe EVSs quality, resulting in higher DMOS values. Moreover,HFRs tend to show better quality than low frame rates. Themain reason for selecting the three frame rates is to investigatethe impact of different levels of compression by employingdifferent encoders, for determining a frame rate that can reflectbetter user-perceived visual quality.In Fig. 3, for all frame rates, a uniform increase can beobserved in the DMOS values as the QP increases, where aQP range of 27-31 shows a uniform behavior of DMOS values,
RE-PRINT SUBMITTED TO ARXIV.ORG 7
27 31 35 39 43 QP D M O S BookCatchtrackHamsterSparklerWatersplashing (a)
27 31 35 39 43 QP D M O S BookCatchtrackHamsterSparklerWatersplashing (b)
27 31 35 39 43 QP D M O S BookCatchtrackHamsterSparklerWatersplashing (c)
Fig. 3: Subjective evaluation corresponding different frame rates encoded with H.265/HEVC encoder as DMOS valuesagainst QPs levels, where (a) 15fps, (b) 30fps, and (c) 60fps.
27 31 35 39 43 QP D M O S BookCatchtrackHamsterSparklerWatersplashing (a)
27 31 35 39 43 QP D M O S BookCatchtrackHamsterSparklerWatersplashing (b)
27 31 35 39 43 QP D M O S BookCatchtrackHamsterSparklerWatersplashing (c)
Fig. 4: Subjective evaluation corresponding different frame rates encoded with VP9 encoder as DMOS values against QPslevels, where (a) 15fps, (b) 30fps, and (c) 60fps.i.e., corresponding to
Excellent quality for all EVs accordingto Table. II. Fig. 3 also shows different DMOS values foreach EVSs due to different SI and TI values i.e., low to high.For example, the behaviors of Catchtrack and WatersplashingEVs show higher DMOS values for QP ranging within 35-39at 15fps and 30fps, depicted in Figs. 3(a) and 3(b), reflecting
Poor quality according to Table. II. These figures also showsthat the DMOS values for Catchtrack and Watersplashingdeteriorate for QP ranging within 39-43 reflected as
Bad quality. Furthermore, at QP ranging within 35-39, the DMOSvalues for Hamster and Sparkling at 15fps and 30fps, as shownin Fig. 3(a) and Fig. 3(b), respectively, reflect a
Good qualitywhile those for QP ranging within 39-43 reflect
Fair quality,according to Table. II.Fig. 3(c) shows that, for 60fps, even for higher QP levelsranging within 39-43, the DMOS values reflect a
Fair qualityfor all EVSs, according to Table. II. Note that EVSs such asBooks, Hamster, and Sparkler show lower DMOS values forQP ranging within 27-35 as
Excellent quality, while thosefor QP ranging within 35-39 as
Good quality, according toTable. II and Fig. 3(c). However, when the QP level rangeswithin 39-43, the quality reflected in terms of DMOS values for Watersplashing is
Fair for 60fps, as shown in Fig. 3(c).From the subjective measurements generated in terms ofDMOS values, as shown in Fig. 3, there is a clear impact ofdifferent frame rates at different compression (QP) levels onthe user-perceived quality. HFR such as those shown in Fig.3(c) exhibit a
Good visual quality even when the clips arecompressed at higher QP levels. Thus, when a network has lowbit-rate requirements, H.265/HEVC preserves the perceptualquality of the EVs even at the QP level ranging within 35-43.However, mixed DMOS values are reported at frame rates of15fps and 30fps for QP level ranging within 31-39 for differentEVSs, as shown in Figs. 3(b) and 3(c).
2) Subjective Test Results for VP9:
Similarly, for subjectivequality assessment, five clips (Books, Catchtrack, Hamster,Sparkler, and Watersplashing) are chosen based on the SIand TI plotting shown in Fig. 2, and encoded using the VP9encoder, as explained in section III. A detailed subjectivequality assessment for all three frame rates is conducted usingthe subjective perceptual video quality tool provided by MSU[32].Fig. 4 depicts the subjective measurements as DMOS valuesagainst each QP level, using the VP9 encoder for 15fps, 30fps,and 60fps clips. The subjective assessment is conducted for
RE-PRINT SUBMITTED TO ARXIV.ORG 8
TABLE III: FR-VQM objective quality models corresponding each clip at 15fps for H.265/HEVC.
QP levels FR-VQA performance metrics for EVs BooksPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 38.8320 0.9248 QP levels FR-VQA performance metrics for EVs CatchTrackPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 37.2140 0.8681 QP levels FR-VQA performance metrics for EVs HamsterPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 38.2380 0.9203
QP levels FR-VQA performance metrics for EVs SparklerPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 38.6280 0.9463
QP levels FR-VQA performance metrics for EVs WatersplashingPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 37.9580 0.8876 each frame rate clip, resulting in 75 EVSs in total. In Fig.4, for all frame rates, a uniform increase can be observed inthe DMOS values as the QP increases, where a QP rangeof 27-31 shows a uniform behavior of DMOS values, i.e.,corresponding to
Excellent quality for all EVSs, accordingto Table. II. However, different DMOS values are recordedfor each EVSs. For example, the behaviors of Catchtrack andWatersplashing EVSs depicted in Figs. 4(a) and 4(b) at 15fpsand 30fps, respectively, with QP ranging 35-39 result in
Poor and
Fair qualities, respectively, according to Table. II. Notethat EVSs such as Books, Hamster, and Sparkler at 15fps and30fps as shown in Figs. 4(a) and 4(b), for QP ranging within31-35 reflect
Good quality, according to Table. II. For EVSsat 60fps, as shown in Fig. 4(c), within QP ranging 35-43, allEVs reflect
Good quality, except for Catchtrack which reflect
Fair quality, according to Table. II. However, for QP rangingwithin 27-31, all EVSs reflect
Excellent quality, according toTable.II and shown in Fig. 4(c). From the subjective measurements generated in terms ofDMOS values, as shown in Fig. 4, we can interpret a ckearimpact of different frame rates at different compression (QP)levels on the user-perceived quality. HFR such as those shownin Fig. 4(c) exhibit a
Good visual quality even when theclips are compressed at QP levels within 35-43. Thus, whena network has low bit-rate requirements, VP9 preserves theperceptual quality of the EVs even at QP levels within 35-43.However, mixed DMOS values are reported at frame rates of15fps and 30fps for QP level ranging within 31-39 for differentEVs, as shown in Figs. 4(b) and 4(c).
B. Performance of Objective Models
This section quantifies the relation required to be fittedbetween the VQA and the subjective measurements generatedin terms of DMOS values explained in sections V and VI,respectively.
RE-PRINT SUBMITTED TO ARXIV.ORG 9
TABLE IV: FR-VQM objective quality models corresponding each clip at 30fps for H.265/HEVC.
QP levels FR-VQA performance metrics for EVs BooksPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 44.7351 0.9441
QP levels FR-VQA performance metrics for EVs CatchtrackPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 43.8270 0.9404
QP levels FR-VQA performance metrics for EVs HamsterPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 44.8801 0.9382
QP levels FR-VQA performance metrics for EVs SparklerPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 44.6682 0.9388 (d)
QP levels FR-VQA performance metrics for EVs WatersplashingPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 43.5507 0.9016
1) FR Objective VQA Results for H.265/HEVC:
Consider-ing the 11 state-of-the-art FR-objective VQA metrics discussedin section V, including recently adopted VAMF trained onNETFLIX’s videos [40], we attempt to determine the relationbetween subjective and objective measurements for bench-marking [41]. The detailed performance of the selected FR-VQA at all three frame rates using HEVC is listed in Table.III, Table. IV, and Table. V, respectively.Table. III shows the performance of HEVC using FR-VQAmetrics at 15fps for different EVSs at different QP levels. Itcan be seen that, for each EVSs, as the QP level increases, theFR-VQA metrics vary correspondingly just like the subjectivequality assessment measurements discussed in section VI(A).Due to the larger size of the coding tree unit, coding treeblock, and an improved variable-block-size segmentation [6]in HEVC, the FR-VQA metrics such as PSNR, SSIM, WSNR,UQI, VIF, and VMAF [42] show a better performance. How- ever, just like different DMOS values generated for differentEVSs as shown in Fig. 3(a), the performance of FR objectiveVQA metrics for different EVSs show different results. ForQP ranging within 27-31, all EVs are showing higher valuesof PSNR, SSIM, WSNR, UQI, VIF, and VMAF reflected as
Excellent quality. Similarly, as seen in Tables. III(b) and III(e),the FR-VQA metrics in QP ranging 39-43 show lower valuesfor FR-VQA metrics reflected as
Bad quality. On the otherhand for QP ranging within 39-43, in Tables. III(a), III(c), andIII(d), a better performance of FR-VQA metrics is reflected asa
Fair visual quality. These results are highly correlated withthe subjective measurements shown in Fig. 3(a). Table. IIIalso shows that for a network with low bit-rate requirementsat 15fps EVSs, an acceptable quality is achieved using HEVCranges within 27-31 based on both subjective and objectivemodels.Table. IV shows the performance of HEVC using FR-
RE-PRINT SUBMITTED TO ARXIV.ORG 10
TABLE V: FR-VQM objective quality models corresponding each clip at 60fps for H.265/HEVC.
QP levels FR-VQA performance metrics for EVs BooksPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 QP levels FR-VQA performance metrics for EVs CatchtrackPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 QP levels FR-VQA performance metrics for EVs HamsterPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 QP levels FR-VQA performance metrics for EVs SparklerPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 QP levels FR-VQA performance metrics (Watersplashing)PSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 VQA metrics at 30fps for different EVs at different QPlevels. It can be seen that for each EVSs, as the QP levelincreases, the FR-VQA metrics vary correspondingly just likethe subjective quality assessment measurements as discussedin section VI(A). Table IV also shows that the impact ofincreasing the frame rate results in an increase in the visualquality [10], [43]. For QP ranging within 27-31, all EVSsshow higher values of PSNR, SSIM, WSNR, UQI, VIF, andVMAF being reflected as
Excellent quality. Similarly, as seenin Tables. IV(b) and IV(e), the FR-VQA metrics within QPranging 39-43 show lower values for FR-VQA metrics beingreflected as
Poor quality. The QP ranging within 39-43 inTables. IV(a), IV(c), and IV(d) shows higher values reflectedas a
Good visual quality. These results are highly correlatedwith the subjective measurements as shown in Fig. 3(b). Table.IV also shows that the network with low-bit rate requirementsat 30fps EVs indicates acceptable quality using HEVC ranges within 31-35, based on both subjective and objective models.Similarly, Table. V shows the performance of HEVC usingFR-VQA metrics at 60fps for different EVs at different QPlevels. It can be seen that for each EVSs, as the QP levelincreases, the FR-VQA metrics vary correspondingly, just likethe subjective quality assessment measurements discussed insection VI(A). Table. V also shows the impact of increasingthe frame rate on increasing visual quality [10], [12], [43].As shown in Fig. 3(c), the subjective measurements for allEVSs at QP ranging within 39-43 reflects
Fair visual quality,Table. V confirms, in a correlated fashion, higher values ofFR-VQA metrics within the same QP ranges. As observed inTables. V(a), V(c), and V(d), the QP values ranging within 35-39 show higher values for FR-VQA metrics being reflectingas a
Good visual quality, while for the QP values rangingwithin 27-35 reflect an
Excellent visual quality. These resultsare highly correlated with the subjective measurements shown
RE-PRINT SUBMITTED TO ARXIV.ORG 11
TABLE VI: FR-VQM objective quality models corresponding each clip at 15fps for VP9.
QP levels FR-VQA performance metrics for EVs BooksPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 38.1824 0.9205 QP levels FR-VQA performance metrics for Evs CatchtrackPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 37.3307 0.8713 QP levels FR-VQA performance metrics for EVs HamsterPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 38.3620 0.9281 QP levels FR-VQA performance metrics for EVs SparklerPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 37.8244 0.8905
QP levels FR-VQA performance metrics for EVs WatersplashingPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 37.0120 0.8864 in Fig. 3(c). Note that the perceptual visual quality in termsof DMOS and FR objective metrics increases with the framerate. Table. V also shows that for a network with low bit-raterequirements at 60fps EVSs, an acceptable quality is achievedusing HEVC ranging within 27- 39 based, on both subjectiveand objective models.
2) FR Objective VQA Results for VP9:
Considering the11 state-of-the-art FR-objective VQA metrics discussed insection V, including the recently adopted VAMF trained onNETFLIX’s videos [40], we attempt to determine the relationbetween subjective and objective measurements for bench-marking. Detailed performance of the opted FR-VQA at allthree frame rates using VP9 at 15fps, 30fps, and 60fps is given in Table. VI, Table. VII, and Table. VIII, respectively.Table. VI shows the performance of VP9 in terms of FR-VQA metrics for 15fps EVSs at different QP levels. A uniformvariation can be observed for the FR-VQA metrics as the QPlevel varies correspondingly, as shown in Fig. 4, during thesubjective quality assessment discussed in section VI(A) dueto compression. VP9 which is customized for video resolutionsbeyond 1080p and lossless compression, is 20% less efficientthan HEVC and requires two times the bit-rate to reach thesame quality that is achieved by HEVC. H.265/HEVC stilloutperforms in terms of visual quality in terms for FR-VQAmetrics at the cost of encoding time [21]. For 15fps, theFR-VQA metrics such as PSNR, SSIM, WSNR, UQI, VIF,
RE-PRINT SUBMITTED TO ARXIV.ORG 12
TABLE VII: FR-VQM objective quality models corresponding each clip at 30fps for VP9.
QP levels FR-VQA performance metrics for EVs BooksPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 43.4010 0.9305
QP levels FR-VQA performance metrics for EVs CatchtrackPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 43.0286 0.9288 QP levels FR-VQA performance metrics for EVs HamsterPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 43.8722 0.9311
QP levels FR-VQA performance metrics for EVs SparklerPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 43.9005 0.9355
QP levels FR-VQA performance metrics for EVs WatersplashingPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 43.0167 0.8984 and VMAF, show better performance, as shown in Table. VI.For QP ranging within 27-31, all EVSs show higher valuesof PSNR, SSIM, WSNR, UQI, VIF, and VMAF reflected as Excellent quality. However, the FR-VQA metrics in Tables.VI(b) and VI(e) within the QP range of 39-43 show a
Poor quality, while in the same QP range, a
Fair quality is observed,as shown in Tables. VI(a), VI(c), and VI(d). Furthermore, theFR-VQA metric performance using VP9 within QP range 31-35 in Tables. VI(a), VI(c), and VI(d) reflects
Good quality,while in the same QP range, the reflected quality in Tables.VI(b) and VI(e) is
Fair . The behavior of FR-VQA metrics ishighly similar to that observed during the subjective qualityassessment ,as shown in Fig. 4(a). Table. VII shows the performance of VP9 in terms of FR-VQA metrics for 30fps EVSs at different QP levels. It canbe observed in Table. VII that, for each EVSs as QP levelincreases, the performance of FR-VQA metrics varies corre-spondingly, similar to the subjective quality measurements, asdiscussed in section VI(A) and shown in Fig. 4(b). Table.VII also proves that as the frame rate increases, the visualquality in terms of FR-VQA such as PSNR, SSIM, WSNR,UQI, VIF, and VMAF metrics, is better than that presented inTable. VI. For QP ranging within 27-31, all EVSs show highervalues of PSNR, SSIM, WSNR, UQI, VIF, and VMAF beingreflected as
Excellent quality. However, the FR-VQA metricswithin the QP range of 39-43 show a
Fair quality for all
RE-PRINT SUBMITTED TO ARXIV.ORG 13
TABLE VIII: FR-VQM objective quality models corresponding each clip at 60fps for VP9.
QP levels FR-VQA performance metrics for EVs BooksPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 QP levels FR-VQA performance metrics for EVs CatchtrackPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 QP levels FR-VQA performance metrics for EVs HamsterPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 QP levels FR-VQA performance metrics for EVs SparklerPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 QP levels FR-VQA performance metrics for EVs WatersplashingPSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF27 EVSs, as shown in Table. VII. Similarly, the FR-VQA metricperformance obtained using VP9 within the QP range 31-35in Tables. VI(a), VI(c), and VI(d) reflects
Good quality, whilewithin the same QP range, the reflected quality in Tables. VI(b) and VI(e) is
Fair . The behavior of FR-VQA metrics ishighly similar to that observed during the subjective qualityassessment, as shown in Fig. 4(b). Tables. VI and VII, clearlyshow that the impact of frame rate, i.e., from 15fps to 30fpshas medial significance on visual quality when VP9 is usedas the encoder. The significance on the visual quality is vividin the case of H.265/HEVC for all EVSs at each QP level.Similarly, Table. VIII shows the performance of VP9 usingFR-VQA metrics at 60fps for different EVs at different QP levels. It can be seen that, for each EVSs, as the QP levelincreases, the FR-VQA metrics vary correspondingly, just likethe subjective quality assessment measurements discussed insection VI(A). Table. VIII also shows the impact of increasingthe frame rate on increasing visual quality [10], [12], [43].As shown in Fig. 4(c), the subjective measurements for allEVSs at QP ranging within 39-43 reflect a
Fair visual quality.Table. VIII confirms, in a correlated fashion, higher valuesof FR-VQA metrics obtained within the same QP range. Asobserved in Tables. VIII(a), VIII(c), and VIII(d), QP valuesranging within 35-39 show higher values of FR-VQA metricsbeing reflected as a
Good visual quality, while QP valuesranging within 27-35 reflect an
Excellent visual quality. These
RE-PRINT SUBMITTED TO ARXIV.ORG 14
TABLE IX: Correlation coefficient scores (PLCC and SROCC) between FR-VQA metrics and DMOS for H.265/HEVC at(15fps, 30fps, and 60fps).
Frame rates FR-VQA performance metrics for H.265/HEVCCorrelation Coefficient PSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF15fps PLCC 0.8783
TABLE X: Correlation coefficient scores (PLCC and SROCC) between FR-VQA metrics and DMOS for VP9 at (15fps,30fps, and 60fps).
Frame rates FR-VQA performance metrics for VP9Correlation Coefficient PSNR SSIM MS-SSIM VSNR WSNR NQM UQI VIF VIFP IFC VMAF15fps PLCC 0.8622 results are highly correlated with the subjective measurementsas shown in Fig. 4(c). Note that the perceptual visual qualityin terms of DMOS and FR objective metrics increases withthe frame rate. Table. VIII also shows that for network withlow bit-rate requirements at 60fps EVSs, the acceptable qualityusing VP9 ranges within 27- 39 based, on both subjective andobjective models. Tables. VIII, clearly show that the impactof frame rate has better significance on visual quality whenVP9 is used as the encoder at frame rate of 60fps. However,in comparison significance on the visual quality is vivid in thecase of H.265/HEVC for all EVSs at each QP level.
C. Correlation between Subjective DMOS AND FR-VQMsMeasurements:
This section attempts to validate and shows the statisticalevaluation of FR-VQM objective quality models listed inTable. (III-VIII). We have computed the Pearson’s LinearCorrelation Coefficient (PLCC) and the Spearman’s Rank-Order Correlation Coefficient (SROCC) between the 11 state-of-the-art FR -VQA metrics and DMOS values using built-inMATLAB function. These statistical evaluation are conductedfor both HEVC and VP9 and for all frame rates (15fps,30fps, and 60fps) that are shown in Table. IX and Table. X,respectively. Noted, that the range of PLCC and SROCC iswithin [0-1], where close to 1 is preferable and indicated as ahigh correlation.Observed in Table. IX, it is infer that for all three framerates (15fps, 30fps, and 60fps) employing H.265/HEVC showsa high a PLCC and SROCC values which is obtained betweenthe DMOS and FR-VQA performance metrics. However, theFR-VQA metrics such as PSNR, WSNR, UQI, VIF, and VMAF shows a high PLCC and SROCC values for all threeframe rates (15fps, 30fps, and 60fps). This can be interpretedthat as, FHD contents encoded at different QP levels usingH.265/HEVC, a high PLCC and SROCC values are generatedreflecting that FR-VQA performance metrics such as PSNR,VIF, and VMAF are performing better and a recommended tobe employed.Similarly, for all three frame rates (15fps, 30fps, and60fps) employing VP9 shows a high a PLCC and SROCCvalues which is obtained between the DMOS and FR-VQAperformance metrics. However, the FR-VQA metrics such asPSNR, VSNR, WSNR, VIF, and VMAF shows a high ccvalues for all three frame rates (15fps, 30fps, and 60fps). Thiscan be interpreted that as, FHD contents encoded at differentQP levels using VP9, a high PLCC and SROCC values aregenerated reflecting that FR-VQA performance metrics suchas PSNR, VSNR, WSNR, and VMAF are performing betterand a recommended to be employed.VII. R
ECOMMENDATIONS FOR E NHANCING THE Q UALITY E STIMATION OF
FR-VQM S :All these aforementioned FR-VQMs are designed to esti-mate the spatial degradations of an image or a video sequence.Estimating the perceptual quality of compressed videos basedon varying frame rates requires temporal degradation in-formation. This information can be acquired by processingthe compressed video sequences and extract certain features.These measurements or features should be included in theoverall quality estimation of a video sequence. • The first and most important measure is the temporaldifference between two consecutive frames of a video.
RE-PRINT SUBMITTED TO ARXIV.ORG 15
The higher the difference, higher will be the motioncomplexity of the video. This feature helps in estimatingthe motion content in a video sequence. It is expectedthat videos with higher frame rates have higher motioncomplexity and vice versa. The simplest way to estimatetemporal frame difference is by taking the differencebetween pixel intensities of two consecutive frames,where pixel intensities lie within the range of 0-255.The average, maxima and minima of these temporaldifferences can be manipulated further for more accuratequality estimation. • Temporal difference measurements along with the framerate, the duration of the video sequence under estimationand the total number of frames should be included inoverall quality estimation of a compressed video se-quence. • Furthermore, temporal difference helps in highlightingthe scene changes in a video sequence. For example,if there is a scene change in a video sequence thenthe temporal difference between the consecutive frameswhere the scene change happens will be much highercompared to rest of the video sequence. Scene changesin compressed videos can lead to a different perceptualquality and if highlighted then they can help in betterestimation of video quality.The aforementioned measures can be part of any FR-VQMto enhance the overall quality estimation and it will help in abetter correlation between the VQMs’ measurements and theDMOS. VIII. C
ONCLUSION
This paper presented the impact of HFR on the perceptualquality of compressed FHD videos. The FHD video contentat three frame rates, obtained from the BVI-HFR database,was compressed using H.265/HEVC and VP9 encoders at fiveQP levels. A detailed subjective quality assessment of thecompressed videos for both encoders and individual framerates was performed, which resulted in DMOS values assubjective measurements. The benchmarking investigation ofthe FR-objective quality assessment using 11 state-of-the-art metrics was conducted to show the correlation betweenthe subjective and objective models. We showed that theperformance of H.265/HEVC for each frame rate at eachQP level is better than VP9. With an increase in frame rate,the perceptual quality in terms of FR-VQA metrics such asPSNR, SSIM, WSNR, UQI, VIF, and VMAF, also increasedfor both H.265/HEVC and VP9. After performing statisticalevaluation to validate both models in terms of cc, FR-VQAmetrics for both H.265/HEVC and VP9 is recommended forcompressed FHD contents. We also provide a recommendationfor enhancing the quality estimation of full-reference (FR)video quality measurements (VQMs) is presented after theextensive investigation. Furthermore, in case of a network withlow-bit requirement, the recommended QP level for each framerate was shown for each encoder that can reflect a better visualquality to the users.There are several relevant research concerns to be addressedin the future. Additional investigation on the impact of HFRs, such as 90fps and 120fps, at different QP levels for 4K and8K UHD video contents will be significant and interesting.Besides, developing new objective quality metrics for suchenvironments is desirable.A
CKNOWLEDGMENT
The authors would like to thank...R
EFERENCES[1] M. Cheon and J.-S. Lee, “Subjective and objective quality assessmentof compressed 4k uhd videos for immersive experience,”
IEEE Trans-actions on Circuits and Systems for Video Technology , vol. 28, no. 7,pp. 1467–1480, 2017.[2] A. Mackin, F. Zhang, and D. R. Bull, “A study of high frame rate videoformats,”
IEEE Transactions on Multimedia , vol. 21, no. 6, pp. 1499–1512, 2018.[3] M. A. Usman, M. R. Usman, and S. Y. Shin, “Exploiting the spatio-temporal attributes of hd videos: A bandwidth efficient approach,”
IEEETransactions on Circuits and Systems for Video Technology , vol. 28,no. 9, pp. 2418–2422, 2018.[4] I. BT2020, “Parameter values for ultra-high definition television systemsfor production and international program exchange,” 2015.[5] Q. Huang, S. Y. Jeong, S. Yang, D. Zhang, S. Hu, H. Y. Kim, J. S. Choi,and C.-C. J. Kuo, “Perceptual quality driven frame-rate selection (pqd-frs) for high-frame-rate video,”
IEEE Transactions on Broadcasting ,vol. 62, no. 3, pp. 640–653, 2016.[6] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview ofthe high efficiency video coding (hevc) standard,”
IEEE Transactionson circuits and systems for video technology , vol. 22, no. 12, pp. 1649–1668, 2012.[7] J. Bankoski, R. S. Bultje, A. Grange, Q. Gu, J. Han, J. Koleszar,D. Mukherjee, P. Wilkins, and Y. Xu, “Towards a next generation open-source video codec,” in
Visual Information Processing and Communi-cation IV , vol. 8666. International Society for Optics and Photonics,2013, p. 866606.[8] M. Sugawara, K. Omura, M. Emoto, and Y. Nojiri, “P-30: Temporalsampling parameters and motion portrayal of television,” in
SID Sympo-sium Digest of Technical Papers , vol. 40, no. 1. Wiley Online Library,2009, pp. 1200–1203.[9] Y.-F. Ou, Z. Ma, T. Liu, and Y. Wang, “Perceptual quality assessmentof video considering both frame rate and quantization artifacts,”
IEEETransactions on Circuits and Systems for Video Technology , vol. 21,no. 3, pp. 286–298, 2010.[10] A. Mackin, F. Zhang, and D. R. Bull, “A study of subjective videoquality at various frame rates,” in . IEEE, 2015, pp. 3407–3411.[11] R. M. Nasiri and Z. Wang, “Perceptual aliasing factors and the impactof frame rate on video quality,” in . IEEE, 2017, pp. 3475–3479.[12] A. Mackin, F. Zhang, M. A. Papadopoulos, and D. Bull, “Investigatingthe impact of high frame rates on video compression,” in . IEEE, 2017, pp.295–299.[13] L. M. Wilcox, R. S. Allison, J. Helliker, B. Dunk, and R. C. Anthony,“Evidence that viewers prefer higher frame-rate film,”
ACM Transactionson Applied Perception (TAP) , vol. 12, no. 4, p. 15, 2015.[14] R. S. Allison, L. M. Wilcox, R. C. Anthony, J. Helliker, and B. Dunk,“Expert viewers’ preferences for higher frame rate 3d film,”
ElectronicImaging , vol. 2017, no. 5, pp. 20–28, 2017.[15] M. Emoto, Y. Kusakabe, and M. Sugawara, “High-frame-rate motionpicture quality and its independence of viewing distance,”
Journal ofDisplay Technology , vol. 10, no. 8, pp. 635–641, 2014.[16] A. B. Watson, “High frame rates and human vision: A view through thewindow of visibility,”
SMPTE Motion Imaging Journal , vol. 122, no. 2,pp. 18–32, 2013.[17] S. Kime, F. Galluppi, X. Lagorce, R. B. Benosman, and J. Lorenceau,“Psychophysical assessment of perceptual performance with varyingdisplay frame rates,”
Journal of Display Technology , vol. 12, no. 11,pp. 1372–1382, 2016.[18] Y. Kuroki, H. Takahashi, M. Kusakabe, and K.-i. Yamakoshi, “Effectsof motion image stimuli with normal and high frame rates on eeg powerspectra: comparison with continuous motion image stimuli,”
Journal ofthe Society for Information Display , vol. 22, no. 4, pp. 191–198, 2014.
RE-PRINT SUBMITTED TO ARXIV.ORG 16 [19] M. Haak, S. Bos, S. Panic, and L. Rothkrantz, “Detecting stress usingeye blinks and brain activity from eeg signals,”
Proceeding of the 1stdriver car interaction and interface (DCII 2008) , pp. 35–60, 2009.[20] K. Debattista, K. Bugeja, S. Spina, T. Bashford-Rogers, and V. Hulusic,“Frame rate vs resolution: A subjective evaluation of spatiotemporalperceived quality under varying computational budgets,” in
ComputerGraphics Forum , vol. 37, no. 1. Wiley Online Library, 2018, pp. 363–374.[21] N. Barman and M. G. Martini, “H.264/mpeg-avc, h. 265/mpeg-hevcand vp9 codec comparison for live gaming video streaming,” in . IEEE, 2017, pp. 1–6.[22] Y.-F. Ou, T. Liu, Z. Zhao, Z. Ma, and Y. Wang, “Modeling the impactof frame rate on perceptual quality of video,” in . IEEE, 2008, pp. 689–692.[23] Y.-F. Ou, Y. Xue, and Y. Wang, “Q-star: A perceptual video qualitymodel considering impact of spatial, temporal, and amplitude resolu-tions,”
IEEE Transactions on Image Processing , vol. 23, no. 6, pp.2473–2486, 2014.[24] J. Nightingale, Q. Wang, C. Grecos, and S. Goma, “The impact ofnetwork impairment on quality of experience (qoe) in h. 265/hevc videostreaming,”
IEEE Transactions on Consumer Electronics , vol. 60, no. 2,pp. 242–250, 2014.[25] Z. Ma, M. Xu, Y.-F. Ou, and Y. Wang, “Modeling of rate and perceptualquality of compressed video as functions of frame rate and quantizationstepsize and its applications,”
IEEE Transactions on Circuits and Sys-tems for Video Technology , vol. 22, no. 5, pp. 671–682, 2011.[26] J. He, E.-H. Yang, F. Yang, and K. Yang, “Adaptive quantization param-eter selection for h. 265/hevc by employing inter-frame dependency,”
IEEE Transactions on Circuits and Systems for Video Technology ,vol. 28, no. 12, pp. 3424–3436, 2017.[27] M. ˇReˇr´abek and T. Ebrahimi, “Comparison of compression efficiencybetween hevc/h. 265 and vp9 based on subjective assessments,” in
Ap-plications of Digital Image Processing XXXVII , vol. 9217. InternationalSociety for Optics and Photonics, 2014, p. 92170U.[28] J. Bienik, M. Uhrina, M. Kuba, and M. Vaculik, “Performance of h.264, h. 265, vp8 and vp9 compression standards for high resolutions,”in . IEEE, 2016, pp. 246–252.[29] M. ˇReˇr´abek, P. Hanhart, P. Korshunov, and T. Ebrahimi, “Qualityevaluation of hevc and vp9 video compression in real-time applications,”in . IEEE, 2015, pp. 1–6.[30] B. Series, “Methodology for the subjective assessment of the quality oftelevision pictures,”
Recommendation ITU-R BT , pp. 500–13, 2012.[31] P. ITU-T RECOMMENDATION, “Subjective video quality assessmentmethods for multimedia applications,”
International telecommunicationunion
IEEEtransactions on image processing , vol. 13, no. 4, pp. 600–612, 2004.[35] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structuralsimilarity for image quality assessment,” in
The Thrity-Seventh AsilomarConference on Signals, Systems & Computers, 2003 , vol. 2. Ieee, 2003,pp. 1398–1402.[36] D. M. Chandler and S. S. Hemami, “Vsnr: A wavelet-based visualsignal-to-noise ratio for natural images,”
IEEE transactions on imageprocessing , vol. 16, no. 9, pp. 2284–2298, 2007.[37] H. R. Sheikh, A. C. Bovik, and G. De Veciana, “An information fidelitycriterion for image quality assessment using natural scene statistics,”
IEEE Transactions on image processing , vol. 14, no. 12, pp. 2117–2128,2005.[38] Y. Han, Y. Cai, Y. Cao, and X. Xu, “A new image fusion performancemetric based on visual information fidelity,”
Information fusion , vol. 14,no. 2, pp. 127–135, 2013.[39] N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans, and A. C.Bovik, “Image quality assessment based on a degradation model,”
IEEEtransactions on image processing , vol. 9, no. 4, pp. 636–650, 2000.[40] V. Netflix, “Video multi-method assessment fusion,” 2019. [41] Z. Wang, J. Wang, F. Wang, C. Li, Z. Fei, and T. Rahim, “A video qualityassessment method for voip applications based on user experience,”
Sensing and Imaging , vol. 18, no. 1, p. 12, 2017.[42] M. A. Usman and M. G. Martini, “On the suitability of vmaf for qualityassessment of medical videos: Medical ultrasound & wireless capsuleendoscopy,”
Computers in biology and medicine , vol. 113, p. 103383,2019.[43] M. A. Usman, M. R. Usman, and S. Y. Shin, “A novel no-referencemetric for estimating the impact of frame freezing artifacts on perceptualquality of streamed videos,”
IEEE Transactions on Multimedia , vol. 20,no. 9, pp. 2344–2359, 2018.
Tariq Rahim is a Ph.D. student at Wirelessand Emerging Network System Laboratory (WENSLab.) in Department of IT Convergence Engineering,Kumoh National Institute of Technology, Republicof Korea. He has completed Master in Informationand Communication Engineering from Beijing Insti-tute of Technology, PR. China 2017. His researchinterests include image and video processing andquality of experience (QoE) for high resolution andhigh frame rate videos.
Muhammad Arslan Usman received B.S. degree inElectrical Engineering (Telecommunications) fromCOMSATS University, Lahore, Pakistan, in 2010and M.S. Degree in Electrical Engineering (Sig-nal Processing) from Blekinge Tekniska H¨ogskola(BTH), Karlskrona, Sweden, in 2013. I completedmy PhD degree in the field of IT Convergence En-gineering from Kumoh National Institute of Technol-ogy, South Korea, in Jan. 2018, and later I continuedin the same research group as a postdoctoral researchfellow for a period of 4 months. My research inter-ests include Quality of Experience (QoE) estimation, modelling and man-agement for medical and general videos, object classification and detection incomputer vision applications, and next generation wireless networks includingnon-orthogonal multiple access. Currently I am working as a Postdoctoralresearch fellow with Wireless Multimedia and Networking (WMN) researchgroup, Kingston University, United Kingdom.