A multi-level approach with visual information for encrypted H.265/HEVC videos
Wenying Wen, Rongxin Tu, Yushu Zhang, Yuming Fang, Yong Yang
AA multi-level approach with visual informationfor encrypted H.265/HEVC videos
Wenying Wen a , Rongxin Tu a , Yushu Zhang b , ∗ , Yuming Fang a and Yong Yang a a School of Information Management, Jiangxi University of Finance and Economics, Jiangxi, 330013, China. b College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
A R T I C L E I N F O
Keywords :H.265/HEVCMulti-level encryptionVisual informationLuma intraprediction modelDCT coefficient sign
A B S T R A C T
High-efficiency video coding (HEVC) encryption has been proposed to encrypt syntax elements forthe purpose of video encryption. To achieve high video security, to the best of our knowledge, almostall of the existing HEVC encryption algorithms mainly encrypt the whole video, such that the userwithout permissions cannot obtain any viewable information. However, these encryption algorithmscannot meet the needs of customers who need part of the information but not the full information in thevideo. In many cases, such as professional paid videos or video meetings, users would like to observesome visible information in the encrypted video of the original video to satisfy their requirementsin daily life. Aiming at this demand, this paper proposes a multi-level encryption scheme that iscomposed of lightweight encryption, medium encryption and heavyweight encryption, where eachencryption level can obtain a different amount of visual information. First, we use AES-CTR togenerate a pseudo-random number sequence. Then, the main syntax elements in the H.265/HEVCencoding process are encrypted by a pseudo-random sequence. In the lightweight encryption level, thesyntax element of the luma intraprediction model is chosen for encryption. In the medium encryptionlevel, the syntax element of the DCT coefficient sign is employed for scrambling encryption. In theheavyweight encryption level, syntax elements of both the luma intraprediction model and the DCTcoefficient sign are encrypted simultaneously by the pseudo-random sequence. It is found that bothencrypting the luma intraprediction model (IPM) and scrambling the syntax element of the DCTcoefficient sign can achieve the performance of a distorted video in which there is still residual visualinformation, while encrypting both of them can implement the intensity of encryption and one cannotgain any visual information. The experimental results meet our expectations appropriately, indicatingthat there is a different amount of visual information in each encryption level. Meanwhile, users canflexibly choose the encryption level according to their various requirements.
1. Introduction
High-efficiency video coding (HEVC) [1] is the latestvideo coding standard that was published by ISO/IEC MPEG,and ITU-T VCEG formed the Joint Collaborative Team onVideo Coding (JCT-VC) in 2013, which has a high efficiencyto compress video. HEVC is adapted to the transmission andstorage from small-scale multimedia networks to large scaleTV distributors and thus has been widely used in daily life[2–4]. Video contains an enormous amount of informationincluding private, sensitive and copyright items [5, 6], whichwould be easily leaked in an unreliable public channel andthe insecurity of the cloud service. Currently, video encryp-tion has been a challenging research topic, as a technologyapplied in military, medical and other related industries tomaintain data security.Video encryption provides a secure channel during trans-mission. To the best of our knowledge, in most of the exist-ing video encryption schemes, the whole video is encrypted,such that the user without permissions cannot obtain anyviewable information. A user with the secret key can seethe original video, whereas users without a key cannot re-ceive any visual information. However, there are not enoughchoices to meet the needs of people with a variety of de-mands. For example, in the professional paid video scenario, e-mail: [email protected] (Y. Zhang)
ORCID (s): users have to pay an expensive fee to watch the video; other-wise, they cannot see any video information. A large num-ber of people need the professional video, but they cannotafford the expensive cost. If the encryption video can be di-vided into a multi-level approach where the different levelshave different amounts of video information, then the videoprovider can set multiple charging standards according to theamounts of video information. To some extent, this allevi-ates the problem of supply and demand in the market, so it isgood for both the user and provider. Another instance occursin important video meetings, such that if we can divide theencryption meeting video into a multi-level approach withdifferent levels that contain different amounts of informa-tion, the leader can set multiple grades of the users accord-ing to the amounts of visual information in the video. Onlyin this way will the meeting video be even more effective inpeople’s work and lives [7, 8].This paper proposes a multi-level encryption approachbased on AES, and then a tunable selection encryption schemecan meet the various requirements of users. First, we useAES-CTR to generate a pseudo-random number sequence.Then, the main syntax elements in the H.265 /HEVC encod-ing process are encrypted by a pseudo-random sequence. Inthe process, only one syntax element is encrypted at a cer-tain encryption level [9]. It can be seen that the encryptionof the luma intraprediction model (IPM) and the scramblingof the DCT coefficient sign can achieve multi-level encryp-
Wen et al.:
Preprint submitted to Elsevier
Page 1 of 13 a r X i v : . [ c s . MM ] N ov able 1 Description of symbols used in the paper
Symbols Definition
HEVC, AES High-Efficiency Video Coding, Advanced Encryption StandardDES, IDEA Data Encryption Standard, International Data Encryption AlgorithmPM, IPM, MV Prediction Modes, Intraprediction Modes, Motion VectorsMI, MVD, TC Merge Index, Motion Vectors Difference, Transform CoefficientsRPS, QP, MVP Reference Picture Set, Quantized Transform, Motion Vector PredictionRefFI, NEA Reference Frames Index, Naïve Encryption AlgorithmsSE, DC Selective Encryption Algorithms, Discrete CosineDCT, CAVLC Discrete Cosine Transformation, Context-Adaptive Variable Length CodingPSNR, SSIM Peak Signal-to-Noise Ratio, Structural SimilarityNPCR, UACI Number of Pixel Change Rate, Uniform Average Change Intensity tion with different visual information. Different levels of theencryption video can adapt to various application scenariosthat depend on the requirements of the users.The main contributions of this paper include the follow-ing points:1. A multi-level selection scheme for encrypted H.265/H-EVC is proposed. It divides the encryption video intothree levels. In video frames with the lightweight en-cryption level, the luma IPM in table 1 is merely en-crypted. In video frames with the medium encryptionlevel, the DCT coefficient sign is chosen for encryp-tion. Moreover, in the heavyweight encryption level,both the luma IPM and DCT coefficient sign are em-ployed for encryption.2. In the proposed scheme, different levels contain differ-ent amounts of visual information. In the first two lev-els, the object features and the outline and structure in-formation of the object are identified, while no usefulinformation can be gained from the encrypted videoin the last level. The proposed multi-level encryptionapproach for H.265/HEVC is provided to meet vari-ous scenario application requirements, and it exhibitsthe flexibility of the proposed algorithm.3. We have theoretically analysed and experimentally tes-ted the performance of the encryption of the luma IPMand DCT coefficient sign. It can be found that the en-cryption of both syntax elements greatly distorts thevideo, while encrypting these syntax elements indi-vidually would reserve different kinds of visual infor-mation.The rest of the paper is organized as follows. In Sec-tion 2, the related work of encryption video is introduced.Preliminary knowledge of the HEVC framework and AESalgorithm are introduced in Section 3. The proposed multi-level encryption scheme is provided in Section 4. The exper-imental results and security analysis are depicted in Section5. Finally, the conclusions are given in Section 6.
Table 2
The characteristics of the related H.265/HEVC encryption al-gorithms
Encryptionscheme Encryptionelements Entirelyencryption
Qiao[10] Bit Stream YesJolly[11] Bit Stream YesLian[12] TC and MVD Sign YesWallendael[13] IPM, RFI YesWallendael[14] RPS, QP, Residual Sign and SAO YesPeng[15] Residual Sign, RFI, DC Coeff Sign,MVD Sign and Value NoWang[16] IPM, Inter-PM NoZhao[17] Compressed Domain YesV.A.Memos[18]Residual Coefficients of I Frame YesBoyadjis[19] Luma IPM, SAO, MVPIdx,MVD Sign and Value YesPeng[20] Luma IPM, Chroma IPM, RefFI, MI,MVD Sign and Value, MVP Index,Residual Sign and Value, SAO Yes
2. Related Work
In past encryption schemes, a code video is regarded asa bit stream, and some traditional ciphers such as the Ad-vanced Encryption Standard (AES) [21] or other bit streamciphers are used to encrypt the bit stream. The method iscalled naive encryption algorithms (NEA). The idea of theNEA does not apply to any of the syntax elements and spe-cial structures but just treats the HEVC stream as text data.There is no existing algorithm that can break triple AES,so it can provide high security to the video because eachbyte is encrypted. In [10], NEA MPEG videos were pro-posed by Qiao et al . MPEG encoded video is encrypted ineach byte by the International Data Encryption Algorithm(IDEA), which is used to generate a pseudo-random sequence.In [11], J. Shah et al . proposed a scheme by using DES andAES to encrypt the bit steam of MPEG video. Both of theirschemes can provide a high security to protect the video that
Wen et al.:
Preprint submitted to Elsevier
Page 2 of 13 igure 1:
The framework of H.265/HEVC. is guaranteed by AES [21] or DES [22]. However, they arenot applications suitable for large videos because the speedof encryption is very slow. Moreover, using the NEA to en-crypt the video with high resolutions can result in high com-putational complexity, which makes it impossible to meetthe requirements of real-time transmission [23]. Therefore,selective encryption (SE) algorithms [24–28] for video havebeen attracting much attention.In the video coding process, some syntax elements playa very significant role that can affect the quality of the finalencoding video. Selective encryption algorithms usually en-crypt some important syntax elements in the coding process,and the standard decoder can decode the encrypted video.Nevertheless, after decoding, encrypted video is seriouslydistorted so that one cannot obtain any useful information,and users with a secret key can acquire the original video.The SE scheme for the H265/HEVC stream was exploitedfrom the work of Lian et al . [12], which proposed the en-cryption of the syntax elements, intraprediction modes fromtransform coefficients and motion information to distort thevideo. In 2013, the new standard of the H265/HEVC waspublished. Wallendael et al . [13] proposed an extensive en-cryption scheme by selecting some syntax elements in theencoding process for encryption, which included Intra syn-tax elements and Inter syntax elements in the H265/HEVCstream. Simultaneously encrypting these syntax elementscan distort the video frame to achieve the effect of encryp-tion. In [14], Wallendael et al . involved more syntax ele-ments for encryption including reference picture set (RPS),quantization parameter (QP), inter-frame information, resid-ual information, de-blocking and sample adaptive offset pa-rameters. The experimental results indicate that the encryp- tion of these syntax elements further can enhance the en-cryption effect of the video. An SE algorithm was proposedby Peng et al . [15], who used the Rossler chaotic system togenerate a pseudo-random sequence to encrypt many syn-tax elements. Even though the scheme has a good encryp-tion performance, the bit rate of the coding generally in-creases. Wang et al . [16] proposed a method that consideredthe relationship between the current and descendant framesthat encrypted current frames more dependent on descendantframes. They just encrypted the current frames, while thedependent frames are not encrypted. Therefore, this methodreduces the bit rate of the video to a large extent. Zhao etal . [17] divided the video frames into foreground and back-ground and then only encrypted the foreground that containsimportant information. Peng et al . [29] employed a protec-tion scheme based on FMO and chaos for the regions of in-terest (ROI), which provided a low bit rate of the video. V.A. Memos et al . [18] proposed an unequal scheme that se-lected the residual coefficients of the I frame to encrypt. Italso has a good performance in visual distortion because Bframes and P frames are predicted by the I frame. However,the encryption space is small and the security encryption isinsufficient. Boyadjis et al . [19] presented a method to en-crypt the syntax elements such as the luma intrapredictionmode and quantized transform coefficients. The informationof edge regions is not given much consideration, though theencryption performance of the I frame is enhanced. Peng etal . [20] extended this technique such that they encrypted theedge regions by scrambling coefficients based on edge ex-traction. To enhance encryption performance, they furtherencrypted the chroma intraprediction mode. The distortionof video achieved great improvement.
Wen et al.:
Preprint submitted to Elsevier
Page 3 of 13 igure 2:
The diagram of the AES encryption algorithm.
However, most of the aforementioned video encryptionalgorithms focus mainly on the whole video encryption thatdepends on the syntax elements and the whole video to beencrypted. The authorized persons can obtain the originalvideo, while an unauthorized user gains an encryption videowithout any useful information. There are not enough choicesprovided for users when the whole video is encrypted. There-fore, there are not enough choices to meet the needs of peoplewith varieties of demands. Moreover, the characteristics ofthe aforementioned H.265/HEVC encryption algorithms arelisted in Table 2. To solve the above-mentioned problems inthe exiting video encryption algorithms, a multi-level videoencryption scheme based on the AES cipher is proposed inthis paper to provide sufficient choices for users.
3. Preliminary Knowledge 𝐸 , the encryption process is de-picted as follows: 𝐶 = 𝐸 ( 𝐾, 𝑃 ) . (1)where 𝐾 is the secret key, 𝑃 is the binary code, and 𝐶 is the encrypted binary code. Actually, placing thesecret key and binary code into the function 𝐸 , it wouldoutput the encrypted binary code. Regarding the AESdecryption function as 𝐷 , the encryption process can berepresented as follows: 𝑃 = 𝐷 ( 𝐾, 𝐶 ) . (2)The decryption process is an inverse of the encryptionprocess. In this paper, 𝑃 is the video bitstream by anentropy coding, which is the binary code. Moreover, 𝐶 is the scrambled entropy coding, which is encrypted inthe video.
4. The Proposed Scheme
This paper proposes a multi-level encryption method forH265/HEVC. First, we use AES-CTR to generate a pseudo-random number sequence. Then, the main syntax elementsin the H.265/HEVC encoding process are encrypted by apseudo-random sequence. It divides the encryption videointo three levels. In video frames with the lightweight en-cryption level, the luma IPM in table 1 is merely encrypted,and the features of object can identified. In video frameswith the medium encryption level, the DCT coefficient signis chosen for encryption, and the outline and structure infor-mation of the object can be identified. Additionally, in theheavyweight encryption level, both the luma IPM and DCTcoefficient sign are selected for encryption, and one cannotgain any useful information from the encrypted video. Theproposed multi-level encryption approach for H.265/HEVCis provided to meet various scenario application requirements,and it exhibits the flexibility of the proposed algorithm. Theframework of the proposed scheme is shown in Figure 3.
Wen et al.:
Preprint submitted to Elsevier
Page 4 of 13 igure 3:
The proposed multi-level encryption scheme.
In each encryption level, the related syntax elements haveto be encrypted by the AES algorithm. First, we employAES-CTR with an initial key 𝑁 to generate a pseudo-random 𝐾 (the secret key), and it can be depicted as 𝐾 = 𝐴𝐸𝑆 − 𝐶𝑇 𝑅 ( 𝑁 ) . (3)where AES-CTR (-) is operated in counter mode. Throughtransforming a block cipher to a stream cipher, it generatesa pseudo-random sequence by encrypting successive valuesof arbitrary length. The random sequence is produced bythe counter without repeating for a long time. More detailsof CTR are described in the work [39].Then, we use the generating pseudo-random sequenceK to encrypt each binary syntax element. The length of Kdepends on the encryption syntax elements. The encryptionprocess is represented as follows.4.1 Lightweight Encryption LevelIn the lightweight encryption level, the syntax elementof luma IPM is encrypted. In the coding process, lumaIPM plays a significant role; B.Boyadjis et al. [19] pro-posed to encrypt luma IPM. Due to the strong correla-tion between the current coding unit and adjacent pix-els in HEVC, the current coding unit is modelled withthe encoded pixels. Moreover, it proposes 35 predictionmodes (from 0 to 34), including planar, DC and angles.Then, the encoder employs traversal prediction modesin total to determine the minimum rate of distortion asthe optimal prediction mode.Moreover, the encoder is not directly recoding the op-timal prediction mode that needs a 5-bit offset DIR but first establishes a candidate mode list of 3 bits accord-ing to the neighbouring Pus because the current codingunit has a very high probability of being the same as theneighbouring Pus. If the current intraprediction modeis in the list, then the encoder needs 3 bits but not 5 bitsto recode the prediction mode, and to a great extent, itreduces the bit rate and the list number is recorded. The 𝐼𝑑𝑥 of the list number is encrypted, and it is defined as 𝐸𝑛 _ 𝐼𝑑𝑥 = (
𝐼𝑑𝑥 + 𝐾 𝑖 )%3 0 ≤ 𝐾 𝑖 ≤ . (4)where 𝐾 𝑖 represents a segment in the pseudo-randomsequence 𝐾 . If the current luma IPM is not in the list,then we are going to scramble the optimal predictionmode; then, there is a large probability of obtaining abad prediction mode that would distort the video in thecoding process. The encryption is defined as 𝐸𝑛 _ 𝐼𝑑𝑥 = 𝐼𝑑𝑥 ⊕ 𝐾 𝑖 ≤ 𝐾 𝑖 ≤ . (5)where ⊕ represents the 𝑋𝑂𝑅 operation. The encryp-tion of luma IPM performs
𝑋𝑂𝑅 operations betweenthe number of the 5-bit offset or the 3-bit candidate modelist with a secret key. That is, the recoded optimal pre-diction mode has scrambling to other prediction modesthat are not suitable to predict the current coding unitand even has a high probability to obtain a terrible pre-diction. It leads to a distortion of the decoded video.Actually, only encrypting the luma intraprediction modecannot achieve full encryption, and the outline informa-tion of the objects is still visible.However, the encryption video with visual informationis the exact requirement for certain application scenar-
Wen et al.:
Preprint submitted to Elsevier
Page 5 of 13a) (b) (c) (d) (e)(f) (g) (h) (i) (j)
Figure 4:
Video test sequences. (a) Akiyo. (b) Bowing. (c) Deadline. (d) Irene. (e) Foreman. (f) Paris. (g) mother. (h)Football. (i) pamphlet. (j) Container. ios. This paper sets the lightweight encryption levelas the first level by encrypting the luma intrapredictionmode.The luma intraprediction mode should be the same be-tween the process of coding and decoding; otherwise,it will cause decoding failure because different modesneed different parameters. The encoder has to set up anew array to record the scrambling list number or bitoffset to solve the asynchronous problem between thecoding and decoding.4.2 Medium Encryption LevelIn the medium encryption level, the syntax element ofthe DCT coefficient sign is encrypted. In HEVC, for fur-ther compression of the video simultaneously withoutmuch distortion, it is transformed from the time domaininformation into the frequency domain information byusing DCT. In the frequency domain, the low-frequencysignal contains the main information, whereas the high-frequency signal contains the object edge informationthat generally would be a zero setting due to the small ef-fect on vision. After the discrete cosine transform, the 2-D block of the DCT matrix is converted into a 1-D arrayby using a scan pattern that defines a processing orderfor the coefficients. Then, the 1-D array is going to becoding by the context-adaptive variable length coding(CAVLC) [40]. After implementing DCT and quantifi-cation, there are many zeros in the array. CAVLC codesthe number of zeros, the position, the value and the signof non-zeros. More details of CAVLC are describedin the work [40]. In the coding process, TotalCoeffsand TrailingOnes cannot be encrypted because they willlead to decoding failure [41]. In the proposed scheme,the sign of the TrailingOnes is to be encrypted. Coef-sign = 1 and coefsign = 0, respectively, represent pos-itive and negative, and the encryption is accomplishedto exchange each other. After scanning the DCT matrix,the TrailingOnes values are on the right of the 1-D ar-ray that contains the high frequency information. Some details of the figure enhance image sharpness, and then,encrypting the sign of TrailingOnes would not influencethe overall outline and acts as a slight perturbation tothe image. The encryption of the DCT coefficient signis represented as 𝐸𝑛 _ 𝑠𝑖𝑔𝑛 = 𝑠𝑖𝑔𝑛 ⊕ 𝐾 𝑖 ≤ 𝐾 𝑖 ≤ . (6)Although the syntax element of the video is encrypted,there still exists a large number of visual information.The effect of the encryption video that has visual infor-mation is the exact requirement for certain applicationscenarios. This paper employs the DCT coefficient signfor encryption as the second level, that is, the mediumencryption level.4.3 Heavyweight Encryption LevelIn the heavyweight encryption level, both the luma IPMand DCT coefficient sign are chosen for encryption. Theways of encryption for the syntax elements are depictedin Section 4.1 and Section 4.2. When the syntax ele-ments are encrypted, one cannot gain any visual infor-mation from the video. Both the luma IPM and the DCTcoefficient sign are employed to encrypt as the third level,that is, the heavyweight encryption level.
5. Experimental Results
In this section, the performance of the proposed multi-level encryption is analysed. A set of benchmark video se-quences that are used in the HEVC standardization processare depicted in Figure 4. The resolution of the video se-quences is 352 x 288, and the frame rate is 60 fps. Thesample video frames from the operation of encrypting anddecrypting are shown in Figure 4. A large number of exper-iments were performed employing a personal computer con-figured with an Intel (R) Core (TM) i5 – 4590 CPU @ 2.60GHz and 16 GB memory, with Windows 10, Visual Studio2019, MATLAB 2018a, and Opencv 2.4.9. The video cod-ing software HM 16.9 is applied for the proposed scheme.The quantization parameter (QP) is set as 10, 25, and 40.
Wen et al.:
Preprint submitted to Elsevier
Page 6 of 13a) (b) (c) (d) (e)(f) (g) (h) (i) (j)(k) (l) (m) (n) (o)(p) (q) (r) (s) (t)
Figure 5:
Proposed encryption approach applied to steam Akiyo (
In Section 4, this paper analyses that the encryption of acertain syntax element can achieve the effect of encryptionvideo with a mass of visual information [42]. According tothe amount of visual information, it divides them into threelevels to meet the requirements of the users. The major ex-periments of the proposed scheme include two parts: visualsecurity and encryption security. However, the proposedmulti-level encryption is not compared with the state-of-the-art algorithms because the proposed scheme needs to exposesome of the visual information and the other algorithms didnot reveal any information. The proposed scheme has threeencryption levels that have different amounts of visual infor-mation. That is, the indexes of different encryption-level ex-periments should have a sharp division between each other.This paper has performed some related experiments, and thedistinction of the experiment’s index has a good fit to theproposed scheme. To illustrate the experiments effect of thethree encryption levels, the sequences that are operated byencryption and decryption are shown in Figure 5. For eachvideo sequence, the encryption effects of I frames and Bframes are displayed, respectively, in the first row and thesecond row of Figure 5. The performance of Akiyo and Bowing are relatively close in their I and B frames.5.1 Analysis of Subjective VisionTo obtain the encryption effect while including someof the visual information, in this paper, the syntax ele-ment of the luma IPM, DCT coefficient sign and both ofthem have been encrypted. The purpose of the proposedmethod is to provide more selections for video providersand users. There are three levels for them to choose, andeach level contains different amounts of visual informa-tion that meet the requirements for the video providersand users. Here, Akiyo and Bowing are chosen for anal-ysis. The proposed scheme has encoded and decoded avideo with 60 frames in each encryption level. The de-coding video has been depicted into
Wen et al.:
Preprint submitted to Elsevier
Page 7 of 13a) (b) (c) (d) (e)(f) (g) (h) (i) (j)
Figure 6:
Comparison of the encryption results of Akiyo (
Table 3
The average PSNR and SSIM of 60 frames in three levels of algorithms with different QP
Video QP
PSNR SSIM
Original LightweightEncryption MediumEncryption HeavyweightEncryption Original LightweightEncryption MediumEncryption HeavyweightEncryption
10 50.5860 15.7597 11.4146 11.2704 0.9946 0.6102 0.4100 0.3955
Akiyo
25 43.4328 15.9601 12.2069 11.4938 0.9808 0.6712 0.5112 0.504040 34.8322 14.7331 11.1179 10.9469 0.9330 0.6472 0.5237 0.507610 49.8118 13.1044 10.8552 10.2631 0.9928 0.5949 0.4589 0.4423
Bowing
25 44.4147 13.0592 11.9812 10.7306 0.9859 0.5965 0.5428 0.493140 34.4788 13.0932 10.6565 10.3354 0.9352 0.6053 0.5465 0.546310 48.9594 13.2268 11.1370 11.0232 0.9944 0.4129 0.2471 0.2296
Deadline
25 40.5181 12.8611 11.3648 11.1794 0.9781 0.4715 0.3091 0.282640 30.6408 13.5111 12.2058 11.9644 0.8832 0.5085 0.3496 0.323910 48.8314 15.8371 10.9286 10.9023 0.9906 0.5479 0.3238 0.3157
Irene
25 41.1426 16.2709 12.0858 11.9623 0.9705 0.6200 0.4781 0.475340 32.6193 16.2759 11.8398 11.7220 0.8828 0.6177 0.4998 0.499310 49.5047 12.4687 12.2353 11.3869 0.9922 0.5004 0.4246 0.3798
Mother
25 42.6266 11.8118 10.5153 9.5799 0.9730 0.5307 0.5221 0.489540 34.3846 11.8778 11.5669 11.1146 0.8822 0.5238 0.5135 0.481610 48.9832 12.2159 11.2651 11.2066 0.9951 0.3991 0.2164 0.1998
Paris
25 39.4438 12.5289 11.0768 10.9711 0.9776 0.4439 0.2543 0.234740 29.0746 12.7116 11.2934 11.2086 0.8689 0.4473 0.2790 0.259210 49.1742 10.7477 10.7151 10.4624 0.9932 0.4458 0.3646 0.3375
Foreman
25 40.0126 10.9497 10.9346 10.8913 0.9612 0.4709 0.4219 0.410440 31.6714 11.2829 10.7100 10.4474 0.8646 0.5093 0.4604 0.422310 49.4153 13.0439 12.0941 11.6432 0.9955 0.5676 0.2491 0.2406
Football
25 38.6179 13.2787 11.9104 11.8104 0.9628 0.5882 0.5691 0.255440 28.1894 13.3560 12.2711 11.6925 0.7419 0.4888 0.3230 0.306110 49.5828 11.9953 11.4926 11.1313 0.9946 0.4731 0.2689 0.2508
Pamphlet
25 43.0321 12.5208 10.7428 10.6312 0.9851 0.5674 0.4082 0.382940 33.0452 11.8277 12.5812 11.3038 0.9053 0.5502 0.4173 0.385310 49.3303 11.3327 11.2844 10.8286 0.9935 0.5660 0.3623 0.3445
Container
25 44.2166 12.3370 11.3169 11.2237 0.9587 0.5825 0.4027 0.391440 31.5489 11.6727 11.2796 11.0621 0.8656 0.5660 0.4498 0.4398 the outline of the object can be easily seen, where onemay gain the movement and morphological characteris-tics of people in the video. In the heavyweight encryp-tion level, it is obvious that we cannot find any visualinformation from the decoded video after encrypting.Furthermore, it can be found that in each encryption level, the users may obtain different information fromthe encrypted video. While almost all of the existingHEVC encryption algorithms mainly encrypt the wholevideo, such that algorithm proposed by Peng et al . [20],the user without permissions cannot obtain any view-able information. The encryption effect is shown in last
Wen et al.:
Preprint submitted to Elsevier
Page 8 of 13 able 4
The average entropy of three levels of algorithms
Video
ENTROPY
Original LightweightEncryption MediumEncryption HeavyweightEncryptionAkiyo
Bowing
Deadline
Irene
Mother
Paris
Foreman
Football
Pamphlet
Container column of Figure 6. Therefore, the proposed schemecan meet different requirements for the users.5.2 Objective Evaluation Index AnalysisTo verify the analysis in Section 5.1, the proposed schemeuses the peak signal-to-noise ratio (
𝑃 𝑆𝑁𝑅 ) and struc-tural similarity (
𝑆𝑆𝐼𝑀 ) to measure the performanceof the three encryption levels [43].
𝑃 𝑆𝑁𝑅 is used tomeasure image distortion, while
𝑆𝑆𝐼𝑀 measures thesimilarity of the image. The smaller the values of thetwo indicators are, the higher the distortion of the frameis, which also means there is less visual informationof the frame. Table 3 shows the average
𝑃 𝑆𝑁𝑅 and
𝑆𝑆𝐼𝑀 of the three encryption levels on 10 videos, inwhich each video contains 60 frames to ensure that theresults are more objective. From the results, one can seethat the value of most sequences is sequentially reducedfrom the left to the right. It is further proven in the anal-ysis result of subjective vision in Section 5.1, where thevisual information is stepped down from the lightweightto the heavyweight encryption level.In the video coding process, there are various QP thatcan be selected, and the smaller the QP is, the moreelaborate the coding frame is. As shown in table 3, withthe increasing number of QP in the original video, the
𝑃 𝑆𝑁𝑅 decreases rapidly because the quality of the im-age is seriously degraded even though one can still seethe picture. However, when these video frames havebeen encrypted, the rank of the
𝑃 𝑆𝑁𝑅 values does notfluctuate too much, and the reason is that when the im-age is encrypted to a certain extent, the
𝑃 𝑆𝑁𝑅 indi-cator is not obvious as a measure of the quality of theimage, but it can still be used to distinguish the pic-ture with different amounts of visual information. Incontrast, the other indicator,
𝑆𝑆𝐼𝑀 , is different than
𝑃 𝑆𝑁𝑅 . With the increasing QP number, the value ofthe
𝑆𝑆𝐼𝑀 does not change much. However, when thevideo frame has been encrypted, there is a large changein the
𝑆𝑆𝐼𝑀 . Because
𝑆𝑆𝐼𝑀 is used to measure thestructural similarity between the different video frames,when the video frame has been encrypted, the structureof the video frame is broken, so the value of
𝑆𝑆𝐼𝑀 extraordinarily changes. (a) (b) (c)(d) (e) (f)(g) (h) (i)
Figure 7:
Key sensitivity test of 𝐻 ( 𝐼 ) is shownas follows: 𝐻 ( 𝐼 ) = − 𝐿 −1 ∑ 𝑗 =0 𝑃 ( 𝐼 𝑗 ) log 𝑃 ( 𝐼 𝑗 ) . (7)where 𝐿 represents the number of possible values, and 𝑃 ( 𝐼 𝑗 ) represents the probability of the pixel value 𝐼 𝑗 .When all of the possible values have the same proba-bilities, the information entropy can achieve the maxi-mum value of 8. The closer the entropy of the encryptedimage is to 8, the better the encryption performance is.The information entropy of encrypted frames is listedin Table 4. One can see that the information entropyvalue is increasing from left to right, and it indicatesthat the encrypting performance is gradually increasing.Moreover, from the lightweight encryption level to theheavyweight encryption level, the visual information isgradually reducing. These findings further confirm theanalysis results of subjective vision in Section 5.1. Wen et al.:
Preprint submitted to Elsevier
Page 9 of 13 able 5
The average NPCR and UACI of three levels of algorithms
Video
NPCR UACI
Original LightweightEncryption MediumEncryption HeavyweightEncryption Original LightweightEncryption MediumEncryption HeavyweightEncryptionAkiyo
Bowing
Deadline
Irene
Mother
Paris
Foreman
Football
Pamphlet
Container
𝑁𝑃 𝐶𝑅 and
𝑈 𝐴𝐶𝐼
AnalysisThe functions of the number of pixels change rate (
𝑁𝑃 𝐶𝑅 )[46] and the uniform average change intensity (
𝑈 𝐴𝐶𝐼 )[47] are used to resist the differential attack. The num-ber of different pixels of two images is measuring by
𝑁𝑃 𝐶𝑅 , while
𝑈 𝐴𝐶𝐼 collects the different values ofpixels of two images. Suppose that 𝐼 and 𝐼 are twocipher-frames defined as follows: 𝑁𝑃 𝐶𝑅 ( 𝐼 , 𝐼 ) = ∑ 𝑛,𝑚 𝑃 ( 𝑛, 𝑚 ) 𝑇 × 100% . (8) 𝑃 ( 𝑛, 𝑚 ) = { , 𝑖𝑓 𝐼 ( 𝑛, 𝑚 ) = 𝐼 ( 𝑛, 𝑚 ) , , 𝑖𝑓 𝐼 ( 𝑛, 𝑚 ) ≠ 𝐼 ( 𝑛, 𝑚 ) . (9) 𝑈 𝐴𝐶𝐼 ( 𝐼 , 𝐼 ) = ∑ 𝑛,𝑚 | 𝐼 ( 𝑛, 𝑚 ) − 𝐼 ( 𝑛, 𝑚 ) | ( 𝐿 − 1) × 𝑇 × 100% . (10)where 𝑇 represents the total number of pixels in eachcipher-frame, 𝐿 denotes the number of allowed pixelvalues, 𝑃 represents the difference between 𝐼 and 𝐼 ,and 𝐼 ( 𝑛, 𝑚 ) and 𝐼 ( 𝑛, 𝑚 ) indicate the pixel values of two cipher-frames at the position ( 𝑛, 𝑚 ) . Recently, the ex-pected values of 𝑁𝑃 𝐶𝑅 and
𝑈 𝐴𝐶𝐼 are given by
𝑁𝑃 𝐶𝑅 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 = (1 − 12 log 𝐿 ) × 100% . (11) 𝑈 𝐴𝐶𝐼 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 = 1 𝐿 𝐿 −1 ∑ 𝑣 =1 𝑣 ( 𝑣 + 1) 𝐿 − 1 × 100% . (12)As the test frames are the length of 8-bit pixel value im-ages, the expected 𝑁𝑃 𝐶𝑅 and
𝑈 𝐴𝐶𝐼 values are 0.996094and 0.334635, respectively, according to Eqs. 11 andEqs. 12. When the values of
𝑁𝑃 𝐶𝑅 and
𝑈 𝐴𝐶𝐼 arecloser to the expected value, the performance of encryp-tion is much better. The
𝑁𝑃 𝐶𝑅 and
𝑈 𝐴𝐶𝐼 resultsof the encrypted frame by the three encryption levelsare shown in Table 5. It can be found that from thelightweight encryption level to heavyweight encryptionlevel, when the
𝑁𝑃 𝐶𝑅 and
𝑈 𝐴𝐶𝐼 results of each videoare gradually increasing, it means that the ability of re-sisting the differential attack is becoming increasinglystronger. This finding demonstrates the fact that the per-formance of the three encryption levels enhances grad-ually because the visual information is gradually reduc-ing. Another reason is that syntax element of the DCTcoefficient sign plays a more important role than theluma IPM, and the encryption performance of encrypt-ing multiple syntax elements is better than that of en-crypting a single element.5.6 Histogram AnalysisThe histogram of a video frame reflects the frequencydistribution of pixels. For good performance of encryp-tion, histograms of original and encrypted video framesshould differ from each other, and the more differentthey are, the higher the security of the system is [25,40]. However, in the proposed scheme, the experimen-tal results are not entirely different at all. Because inthe lightweight encryption level and medium encryptionlevel, the encrypted video frames are not high intensityencryption and there is still a certain residual amount ofvisual information. The histograms of the video framesencrypted by the three different encryption levels are
Wen et al.:
Preprint submitted to Elsevier
Page 10 of 13a) (b) (c) (d)
Figure 8:
Histogram analysis for Akiyo with three encryption levels, original frames are shown in Fig. 4. (a) histogram of framein Fig. 4(a). (b) histogram of frame in Fig. 4(b). (c) histogram of frame in Fig. 4(c). (d) histogram of frame in Fig. 4(d).
Table 6
The average bit rate increment of the algorithms
Originalvideo Akiyo Bowing Deadline Irene MotherBit Ratechange
Originalvideo Paris Foreman Football Pamphlet ContainerBit Ratechange shown in the Figure 8. The histogram of the originalvideo frame, the lightweight encrypted video frame, themedium encrypted video frame, and the heavyweightencryption video frame are presented in Figure 8 (a),(b), (c), (d), respectively. There are some similaritiesin the pixel distributions between Figure 8 (a) and Fig-ure 8 (b), which means that the video frame that ap-plied the lightweight encryption still reserves a certainamount of visual information. Comparing Figure 8 (a)and Figure 8 (c), there is still some resemblance betweenthem; hence, one can obtain some visual informationfrom the video frame after the medium encryption. It isevident from Figure 8 (a) and Figure 8 (d) that the his-tograms of original and encrypted frames are entirelydifferent, which means that the heavyweight encryptionhas a good encryption performance. It further provesthe analysis result of subjective vision in Section 5.1, inwhich the visual information is stepped down from thelightweight encryption level to the heavyweight encryp-tion level.5.7 The Security of Key Stream AnalysisIn the proposed scheme, AES, as put forward by the U.S.National Institute of Standards and Technology (NIST),is adopted to generate the pseudo-random sequences.This sequence can be considered to have a high levelof security because there are no existing algorithms thatcan break the AES to date. The length of the sequenceis more than 192 or 256 bits, which was proven to besecure for protecting the information that needs to beencrypted, even when encrypting a small amount of in- formation. Furthermore, we can also apply other en-cryption algorithms such as the Rossler chaotic system[15] and 2D logistic-adjusted-sine map [46] on to gener-ate a pseudo-random sequence for the proposed scheme.That is, there is nothing specific to the encrypted con-tent to consider; the security of the scheme depends onthe security of the algorithm. In this paper, we pay moreattention to the encryption performance of different syn-tax elements in encrypted H.265/HEVC, so the securityof the encryption algorithm that we adopted is not dis-cussed and tested in detail in this paper.5.8 Bit Rate Change AnalysisIn video encryption algorithm, one of a significant in-dex is bit rate change [48]. Keeping the video bit rate isan ideal state for video encryption. In general, the en-crypted syntax elements in bypass mode can keep thebig rate while the bit rate is inevitably increased as longas the syntax elements encoded in regular mode. Inthe proposed scheme, the encrypted syntax elements ofthe DCT coefficient sign and luma IPM are in the by-pass mode and regular mode, respectively. There is oneencrypted syntax element of luma IPM in the regularmode, therefore we only need to calculate the bit ratechange in lightweight encryption level or heavyweightencryption level. The average bit rate increment is listedin Table 6. It can be found that the bit rate change isslightly increment, almost under 2.5%. The result demon-strates that the encryption algorithm is almost not im-pacted video compression coding system.
Wen et al.:
Preprint submitted to Elsevier
Page 11 of 13 . Conclusions
In this paper, we learn about the real-world requirementsof video encryption and analyse some contradictions betweenusers and video providers. To meet the various requirementsof users and video providers, as a benefit for both of them,a multi-level selective encryption scheme for H.265/HEVCis proposed based on encrypting syntax elements in the cod-ing process. There are three levels for users, which are thelightweight encryption level, the medium encryption level,and the heavyweight encryption level. The syntax elementof luma IPM is encrypted in the lightweight encryption level,the syntax element of the DCT coefficient sign is chosenfor encryption in the medium encryption level, and both ofthe syntax elements are encrypted in the heavyweight en-cryption level. Since only a few numbers of syntax ele-ments are encrypted, the users can always gain some vi-sual information from the encryption videos. The experi-mental results and analysis confirm that the amount of vi-sual information contained in the three encryption levels isdifferent, and the visual information is reduced successivelyfrom the lightweight encryption to the heavyweight encryp-tion, which exactly positions the approach as a solution forthe contradiction between supply and demand that exists be-tween users and video providers.
References [1] D. Grois, D. Marpe, A. Mulayoff, B. Itzhaky, O. Hadar, Performancecomparison of h. 265/mpeg-hevc, vp9, and h. 264/mpeg-avc encoders,in: 2013 IEEE Picture Coding Symposium (PCS), 2013, pp. 394–397.[2] T. Li, H. Wang, Y. Chen, L. Yu, Fast depth intra coding based onspatial correlation and rate distortion cost in 3d-hevc, Signal Process.Image Commun. 80 (2020). doi: .[3] A. Sasithradevi, S. M. M. Roomi, Video classification and retrievalthrough spatio-temporal radon features, Pattern Recognit. 99 (2020).doi: .[4] Y. Fang, G. Ding, W. Wen, F. Yuan, Y. Yang, Z.-J. Fang, W. Lin,Salient object detection by spatiotemporal and semantic features inreal-time video processing systems, IEEE Trans. Ind. Electron.(2019). doi: .[5] W. Wen, K. Wei, Y. Zhang, Y. Fang, M. Li, Colour light field imageencryption based on dna sequences and chaotic systems, NonlinearDynam. 99 (2020) 1587–1600.[6] B. Sridhar, A wavelet based watermarking in video using layer fusiontechnique, Pattern Recognition and Image Analysis 28 (2018) 537–545.[7] R. Duvar, O. Akbulut, O. Urhan, Fast inter mode decision exploitingintra-block similarity in hevc, Signal Process. Image Commun. 78(2019) 503–510.[8] Z. Peng, C. Huang, F. Chen, G. Jiang, X. Cui, M. Yu, Multipleclassifier-based fast coding unit partition for intra coding in futurevideo coding, Signal Process. Image Commun. 78 (2019) 171–179.[9] X. Yao, Z. Chen, Y. Tian, A lightweight attribute-based encryptionscheme for the internet of things, Future Gener. Comput. Syst. 49(2015) 104–112.[10] L. Qiao, K. Nahrstedt, A new algorithm for mpeg video encryption,in: Proc. of First International Conference on Imaging Science Systemand Technology, 1997, pp. 21–29.[11] J. Shah, D. Saxena, Video encryption: A survey, ArXiv PreprintarXiv:1104.0800 (2011).[12] L. Shiguo, L. Zhongxuan, R. Zhen, W. Haila, Secure advanced videocoding based on selective encryption algorithms, IEEE Trans. Con-sumer Electron. 52 (2006) 621–629. doi: . [13] G. Van Wallendael, J. De Cock, S. Van Leuven, A. Boho, P. Lambert,B. Preneel, R. Van de Walle, Format-compliant encryption techniquesfor high efficiency video coding, in: 2013 IEEE International Con-ference on Image Processing, 2013, pp. 4583–4587.[14] G. Van Wallendael, A. Boho, J. De Cock, A. Munteanu, R. Van deWalle, Encryption for high efficiency video coding with video adap-tation capabilities, IEEE Trans. Consumer Electron. 59 (2013) 634–642.[15] F. Peng, H. Li, M. Long, An effective selective encryption schemefor hevc based on rossler chaotic system, in: Proc. of Internationalsymposium on Nonlinear Theory and its Applications, 2015, pp. 1–4.[16] W. Wang, M. Hempel, D. Peng, H. Wang, H. Sharif, H.-H. Chen,On energy efficient encryption for video streaming in wireless sensornetworks, IEEE Trans. Multimedia 12 (2010) 417–426.[17] Y. Zhao, L. Zhuo, M. Niansheng, J. Zhang, X. Li, An object-basedunequal encryption method for h.264 compressed surveillance videos,in: 2012 IEEE International Conference on Signal Processing, Com-munication and Computing (ICSPCC 2012), 2012, pp. 419–424.[18] V. A. Memos, K. E. Psannis, Encryption algorithm for efficienttransmission of hevc media, J. Real-Time Image Process. 12 (2016).doi: .[19] B. Boyadjis, C. Bergeron, B. Pesquet-Popescu, F. Dufaux, Extendedselective encryption of h. 264/avc (cabac)-and hevc-encoded videostreams, IEEE Trans. Circuits Syst. Video Technol. 27 (2016) 892–906.[20] F. Peng, X. Zhang, Z. Lin, M. Long, A tunable selective encryptionscheme for h. 265/hevc based on chroma ipm and coefficient scram-bling, IEEE Trans. Circuits Syst. Video Technol. (2019). doi: .[21] N. Sklavos, O. G. Koufopavlou, Architectures and vlsi implementa-tions of the aes-proposal rijndael, IEEE Trans. Comput. 51 (2002)1454–1459.[22] W. F. Ehrsam, S. M. Matyas, C. H. Meyer, W. L. Tuchman, A crypto-graphic key management scheme for implementing the data encryp-tion standard, Ibm Syst. J. 17 (1978) 106–125.[23] W. Hamidouche, M. Farajallah, N. O. Sidaty, S. E. Assad, O. Dé-forges, Real-time selective video encryption based on the chaos sys-tem in scalable hevc extension, Signal Process. Image Commun. 58(2017) 73–86.[24] Y. Tew, K. Minemura, K. Wong, Hevc selective encryption usingtransform skip signal and sign bin, in: 2015 IEEE Asia-Pacific Signaland Information Processing Association Annual Summit and Confer-ence (APSIPA), 2015, pp. 963–970.[25] A. I. Sallam, O. S. Faragallah, E.-S. M. El-Rabaie, Hevc selectiveencryption using rc6 block cipher technique, IEEE Trans. Multimedia20 (2018) 1636–1644.[26] K. Yang, S. Wan, Y. Gong, H. R. Wu, Y. Feng, An efficient lagrangianmultiplier selection method based on temporal dependency for rate-distortion optimization in h.265/hevc, Signal Process. Image Com-mun. 57 (2017) 68–75.[27] Z. Liu, S. Duan, P. Zhou, B. Wang, Traceable-then-revocableciphertext-policy attribute-based encryption scheme, Future Gener.Comput. Syst. 93 (2019) 903–913.[28] S. S. Maniccam, N. G. Bourbakis, Lossless image compression andencryption using scan, Pattern Recognit. 34 (2001) 1229–1245.[29] F. Peng, X. Zhu, M. Long, An roi privacy protection scheme for h.264 video based on fmo and chaos, IEEE Trans. Inf. Forensics andSecurity 8 (2013) 1688–1699.[30] K. Minemura, K. Wong, R. C. Phan, K. Tanaka, A novel sketch at-tack for h. 264/avc format-compliant encrypted video, IEEE Trans.Circuits Syst. Video Technol. 27 (2016) 2309–2321.[31] F. Peng, X. Gong, M. Long, X. Sun, A selective encryption scheme forprotecting h. 264/avc video in multimedia social network, MultimediaTools Appl. 76 (2017) 3235–3253.[32] M. Gao, X. Fan, D. Zhao, W. Gao, An enhanced entropy codingscheme for hevc, Signal Process. Image Commun. 44 (2016) 108–123.[33] W. Shen, Y. Fan, Y. Bai, L. Huang, Q. Shang, C. Liu, X. Zeng, A
Wen et al.:
Preprint submitted to Elsevier
Page 12 of 13 ombined deblocking filter and sao hardware architecture for hevc,IEEE Trans. Multimedia 18 (2016) 1022–1033.[34] S. Jaballah, M.-C. Larabi, J. B. Tahar, Low complexity intra predic-tion mode decision for 3d-hevc depth coding, Signal Process. ImageCommun. 67 (2018) 34–47.[35] C. Fu, H. Chen, Y.-L. Chan, S.-H. Tsang, X. Zhu, Early terminationfor fast intra mode decision in depth map coding using dis-inheritance,Signal Process. Image Commun. (2020). doi: .[36] N.-U. Kim, S. C. Lim, J. W. Kang, H. Y. Kim, Y.-L. Lee, Transformwith residual rearrangement for hevc intra coding, Signal Process.Image Commun. 78 (2019) 322–330.[37] Y. Fang, X. Zhang, F. Yuan, N. Imamoglu, H. Liu, Video saliencydetection by gestalt theory, Pattern Recognit. (2019). doi: .[38] D. F. de Souza, A. Ilic, N. Roma, L. Sousa, Ghevc: An efficient hevcdecoder for graphics processing units, IEEE Trans. Multimedia 19(2016) 459–474.[39] H. Lipmaa, P. Rogaway, D. Wagner, Ctr-mode encryption, in: FirstNIST Workshop on Modes of Operation, volume 39, 2000.[40] Z. Shahid, W. Puech, Visual protection of hevc video by selectiveencryption of cabac binstrings, IEEE Trans. Multimedia 16 (2014)24–36.[41] M. Zhou, Q. Mao, C. Zhong, W. Zhang, C. Chen, Spatial errorconcealment by jointing gauss bayes model and svd for high effi-ciency video coding, Int. J. Pattern Recognit. Artif. Intell. 33 (2019).doi: .[42] X. Chang, X. Liang, Y. Yan, L. Nie, Guest editorial: Image/videounderstanding and analysis, Pattern Recognit. Lett. 130 (2020) 1–3.[43] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image qualityassessment: From error visibility to structural similarity, IEEE Trans.Image Process. 13 (2004) 600–612.[44] Y. Wu, Y. Zhou, G. Saveriades, S. Agaian, J. P. Noonan, P. Natara-jan, Local shannon entropy measure with statistical tests for imagerandomness, Inform. Sci. 222 (2013) 323–342.[45] W. Wen, R. Tu, K. Wei, Video frames encryption based on dna se-quences and chaos, in: Eleventh International Conference on DigitalImage Processing (ICDIP 2019), volume 11179, International Societyfor Optics and Photonics, 2019, p. 111792T.[46] Z. Hua, Y. Zhou, Image encryption using 2d logistic-adjusted-sinemap, Inform. Sci. 339 (2016) 237–253.[47] Z. Hua, S. Yi, Y. Zhou, Medical image encryption using high-speedscrambling and pixel adaptive diffusion, Signal Process. 144 (2018)134–144.[48] H. Dong, D. K. Prasad, I.-M. Chen, Accurate detection of ellipseswith false detection control at video rates using a gradient analysis,Pattern Recognit. 81 (2018) 112–130.
Wen et al.: