Selective Encryption of VVC Encoded Video Streams For The Internet of Video Things
Selective Encryption of VVC Encoded Video Streams For The Internet of Video Things
Selective Encryption of VVC Encoded Video Streams For The Internet of Video Things
Abstract— Visual sensors serve as a critical component of the videos would require significant considerations [1].
Internet of Things (IoT). There is an ever-increasing demand for Considering the growing demand for IoVT and video
broad applications and higher resolutions of videos and cameras streaming, the amount of video data on the Internet is rapidly
in smart homes and smart cities, such as in security cameras. To increasing. In [2], It is predicted that by 2030, 13 billion
utilize this large volume of video data generated from networks of
visual sensors for various machine vision applications, it needs to cameras will be around the world from which a huge amount of
be compressed and securely transmitted over the Internet. data will be generated. Cisco predicts that in 2022, 82% of IP
H.266/VVC, as the new compression standard, brings the highest traffic will be consumed by video content [3]. Another report
compression for visual data. To provide security along with high indicates a significant demand for video content in future smart
compression, a selective encryption method for hiding information homes [4]; Fig. 1 shows the estimated bandwidth requirement
of videos is presented for this new compression standard. Selective from the results of this report (applications sorted based on their
encryption methods can lower the computation overhead of the
encryption while keeping the video bitstream format which is predicted appearance time in future, the upper it is the earlier
useful when the video goes into untrusted blocks such as appearance time would be). With the upcoming video
transcoding or watermarking. Syntax elements that represent technologies and the demand for higher bandwidth, it is
considerable information are selected for the encryption, i.e., luma necessary to provide efficient solutions for privacy, storage, and
Intra Prediction Modes (IPMs), Motion Vector Difference (MVD), transmission of video data. There are two general methods to
and residual signs., then the results of the proposed method are encrypt a video prior to transmission, naive encryption and
investigated in terms of visual security and bit rate change. Our
experiments show that the encrypted videos provide higher visual selective encryption. Naive encryption is defined where the
security compared to other similar works in previous standards, whole video bitstream is encrypted. Using a secure encryption
and integration of the presented encryption scheme into the VVC algorithm, this type of video encryption would completely hide
encoder has little impact on the bit rate efficiency (results in 2% any information within the video. On the contrary, selective
to 3% bit rate increase). encryption is when only some of video elements are being
encrypted. Fig. 2 shows an overview of a simplified IoT
Index Terms— Selective Encryption, H.266/VVC, IoVT, environment with visual sensors. Depending on the application
Security, Video encryption and different scenarios, some of the computations, such as
encryption and compression, can be processed at the edge.
I. INTRODUCTION Then, the video data can be transmitted to the cloud for other
[6] H.264/AVC - AES- Luma IPM, MVD sign and values, MV reference idx, Merge idx, MVP Yes Yes
H.265/HEVC CTR idx, Residual sign and values, SAO filter
[9] MPEG-1 DES- Headers, DCT coefficients, or whole bitstream (based on security No No
CBC levels)
[10] N/A PRNG Pixels No Yes
[14] H.265/HEVC AES MVD sign and values, Residual sign and values, delta QP, reference Yes Yes
pic idx, Merge idx, MVP idx, SAO parameter
[15] H.265/HEVC AES- Luma and chroma IPM, QTC, MVD signs and values Yes Yes
CTR
[16] H.265/HEVC RC4 QTC, MVD sign and values, IPM luma Yes Yes
[17] H.265/HEVC AES- Chroma and Luma IPM, Residual Sign and values, MVD sign and Yes Yes
CTR values, Merge Idx, MVP idx, Reference frame idx, SAO parameter
3
encoder, such as MVD values, delta QP, and residual anti ation Synta
Inp t Video
Transformation Encryption
components. They also discussed the effect of encryption on the
encoder bit rate and visual scrambling of output frames. In [6], Inv
CA AC
anti ation
the authors presented an SE algorithm for The Context-Aware Inv
Transformation
Binary Arithmetic Coding (CABAC) in the H.264/AVC and Intra
rediction
inari ation
their work show the impact of the selective encryption on the ecoded ict re Video
encoder bit rate and visual distortion of video frames. ffer Stream
transformation CUs will also be respectively divided into TABLE II. VIDEO SEQUENCES USED FOR EXPERIMENTS
smaller Prediction Units (PUs) and Transformation Units Video Resolution Frame Rate Bit depth
(TUs). However, concept of separating CU, TU, and PU are no Mobile 352x288 24 8
longer in use (except for CUs with a larger size than the BasketballPass 416x240 50 8
maximum transform length), in fact VVC uses CUs as the basic BQMall 832x480 60 8
processing units. Moreover, for the purpose of splitting CTUs, Johnny 1280x720 60 8
two types of hierarchical trees are used: quadtree (QT) and FourPeople 1280x720 60 8
multi-type tree (MTT). First, CTUs are divided with a BasketballDrive 1920x1080 50 8
quaternary tree then, if needed, the leaf nodes are further PeopleOnStreet 2560x1600 30 8
partitioned with multi-type tree (multi-type splitting types are Traffic 2560x1600 30 8
shows in Fig. 4). As a result, the VVC encoder has a wide range RaceNight 3840x2160 50 10
of rectangle blocks and shapes for a CTU to split into, which HoneyBee 3840x2160 120 10
would bring higher compression (as well as higher computation
cost). Also, the tiles, slices, and subpictures are separated in the VVenC only has the encoder, and the decoder side is not
bitstream, which is suitable for parallel processing in encoder available yet. Thus, for our simplicity during the tests we used
and decoder. This technique would also be helpful in the 360- the VVenC for the encoding and VTM for decoding video
degree videos since it allows the receiver to only decode the streams. We have used the randomaccess_faster configuration
regions of video that the user is seeing. of the VVenC software which has GOPsize of 32 and
For Intra prediction, there are 67 Intra modes in the intraPeriod of 32. In this case there will be one I frame with 31
H.266/VVC which make the prediction more accurate following B frames. The base quantization parameter (QP) is
compared to H.265/HEVC standard which has 35 modes. Two also selected to vary from values 8, 24, and 32 depending on the
of the 67 modes are planar and DC modes, the rest belong to the performed tests. The level value which determines the tire that
angular predictions. In the new coding standard, the intra the encoded bitstream compiles to is chosen from Table 140 and
prediction is similar to H.265/HEVC. Depending on the upper Table 141 of [22] for each corresponding test video. Its value is
and left blocks of the current coding block, a list of Most dependent to video resolution (luma height and width) and
Probable Modes (MPMs) with size of six, will be constructed. frame rate of the video under test.
Then, if the current prediction mode is one of MPM elements,
the encoder binarizes its index using truncated unary coding, B. Encryption of Syntax Elements
otherwise index of the remaining 61 modes will be binarized. The encryption is applied prior to the binarization of syntax
Because the partitioning blocks in H.266/VVC are not elements (as shown in Fig. 3). Similar to HEVC, Context-
necessarily squares, some of prediction angles have become Aware Binary Arithmetic Coding (CABAC) in VVC has two
wider than usual 45 degrees to -135 degrees. This would allow regular and bypass modes for encoding. In bypass mode, all
using more reference pixels for the prediction. For Inter symbols are considered as equiprobable for the encoding. For
Prediction, VVC has several new tools. One of the interesting Regular mode, a probability model is determined using context
new features is affine motion compensation. In previous codecs of elements and the encoding is performed based on this
like H.265/HEVC, the movement of objects are only in 2D probability model. Thus, encryption of syntax elements
dimensional directions. However, in real videos we barely see encoded with bypass mode would be more suitable since it does
only planar motions and movements of objects between video not affect the probability model, which would lower the
frames might accompany rotation or scaling (zooming in or encoding efficiency. For the encryption of elements that are
out). Thus, affine motion compensation is introduced to encoded in regular mode, we should also make sure that their
represent these more complex motions. There are a couple of probability model is also changed accordingly. In the proposed
other newly introduced tools for inter prediction among which method, the key syntax elements that are considered for the
we can mention Merge Mode with MVD (MMVD), Symmetric encryption in the H.266/VVC algorithm are luma IPM value,
MVD Coding, Adaptive Motion Vector Resolution (AMVR), horizontal and vertical values/signs of motion vectors, and signs
etc. For more information and details of these feature readers of residual values. For the encryption process, a binary stream
can refer to VTM algorithm descriptions [21]. needs to be generated using an encryption algorithm, and then
the XOR operation is made between the stream and syntax
III. VVC SELECTIVE ENCRYPTION elements. For this purpose, it is recommended to use the
Advanced Encryption Standard (AES) in CTR mode of
A. Configuration, Software, and Test Video Sequences
operation for the encryption and decryption. The x and y
A wide range of video sequences, from low resolution to 4K position values of each unit can be used for the Initialization
videos with different frame rates and bit depths, are selected for Vector (IV) of the encryption algorithm (e.g., AES-CTR),
our tests in order to investigate results of the proposed SE which will maintain the security of key during the encryption
algorithm (selected test videos are shown in Table II). The of all syntax elements. Since some of the syntax elements have
software for H.266/VVC is VTM [18]; currently, its latest a specific range (e.g., luma most probable modes are from 0 to
version is 10.2. There is another software for VVC coding 5), the XOR operation should be performed carefully so that the
named VVenC [19], which is actually much faster compared to final value is not outside of the available range of the
VTM. We can look at VTM as the complete software with all corresponding syntax element. Without such considerations,
of the tools and options, while VVenC provides a faster because of format compliancy of the video bitstream, the
implementation of H.266/VVC. However, unfortunately, the decoder might face some problems. Thus, we choose a desirable
5
number of last bits in the binary stream generated from the IV. EXPERIMENTAL RESULTS
AES-CTR and then we do the XOR operation if its value after
A. Visual Security
XOR is not beyond the range of available values. For the
encryption of luma IPM modes, the encoder either encodes the The encryption of selected syntax elements results in highly
index of Most Probable Mode (MPM) (predicted from the distorted videos which indicates that the details and information
neighboring units) or it encodes the index of 61 remaining of videos are securely hidden. Fig. 5 shows the visual
modes for the binarization (when the mode is not among the disturbance caused by the encryption of each syntax element.
MPM modes). Encoding of the MPM uses bypass mode and Based on these sample frames, we observe the importance of
requires five bits since it uses truncated unary coding for luma IPM modes in the H.266/VVC videos since their
binarization. For the remaining IPM modes, the six bits fixed encryption causes most of deterioration in the output video (as
length coding is used and encoding is done in bypass mode. can be seen in the column four of the Fig. 5). After luma modes,
Thus, the encryption is applied to luma modes according to the MVD modes are the next key syntax elements that their
selected mode for the luma intra prediction and their respective encryption is causing highest deterioration in final videos
binarization. For the format compliancy, when MPM modes are (column three Fig. 5). However, residual signs have a smaller
being used, the encrypted syntax should not be greater than 5. impact on the output video as can be seen from column two of
Also, when the prediction mode is not among the MPM modes, Fig. 5. The quantitative performance of the videos are compared
the encrypted remaining IPM mode has a maximum range of 61 using PSNR, SSIM [23], and VMAF [24], [25]. SSIM shows
because six MPM modes are removed from all 67 available the structural similarity index which compares the processed
luma modes. Motion vectors constitute of coefficients and images (the encrypted video frames for our application) and the
corresponding signs for horizontal and vertical directions. The original images (when the video is encoded with VVC but is
absolute values are encoded using two flags in regular mode, not encrypted). To measure the performance of an encryption
(abs_mvd_greater_0 and abs_mvd_greater_1) and algorithm, the lower the SSIM the better the performance of the
abs_mvd_minus_2 which is bypass coded with golmb-rice algorithm would be. PSNR is another metric which is not very
binarization. We only have chosen to encrypt the good four our application, however because of its popularity we
abs_mvd_minus_2 and corresponding signs since they are have employed this metric as well. Video Multimedia
encoded in bypass mode. Another syntax element selected for Assessment Fusion (VMAF) is a relatively new metric used for
encryption is signPattern which represents signs of residual video quality measurement which is developed by Netflix. The
coefficients and is encoded in bypass mode. VMAF returns a score between 0 to 100, and a higher score
SD
6
shows more resemblance between the two given original and TABLE III – PERFORMANCE OF THE SCHEME IN TERMS OF
processed videos. In table III, the performance results from the SSIM, PSNR AND VMAF FOR EACH VIDEO SEQUENCE.
mentioned metrics are provided for selected test videos while Original Encrypted
Video QP
using the encryption. We can see that the SSIM values for Sequence SSIM PSNR VMAF SSIM PSNR VMAF
8 0.997 49.81 99.959 0.078 9.235 6.642
encrypted videos are much lower compared to the original
24 0.98 35.78 99.234 0.075 8.819 6.190
values. The PSNR values also confirm the effect of encryption Mobile
40 0.876 26.25 75.375 0.072 9.027 6.068
on visual deterioration of encrypted videos. In some cases, we
8 0.995 50.52 99.339 0.362 13.21 2.877
observe mediocre PSNR scores even though the SSIM is very BasketballP
ass 24 0.972 40.48 94.397 0.419 15.01 2.254
low and video quality is mostly disturbed. This shows the fact
40 0.834 30.51 52.556 0.43 10.17 2.2
that PSNR might not be a very good metric here. Also, the
8 0.999 49.88 99.959 0.201 10.62 5.015
reason that the VMAF scores for some original videos are not
BQMall 24 0.989 38.47 98.745 0.216 10.81 5.457
as high as it should be (close to 100) is that we have not used
40 0.912 29.39 66.379 0.227 9.8 6.445
the 4K models. VMAF is in fact a metric that is trained with
8 0.998 50.12 98.248 0.397 8.596 0
resolutions up to 1080p, as a result we see mediocre scores for 24 0.992 42.34 95.258 0.476 9.510 0
Johnny
some of original videos. However, these scores can still be used 40 0.976 36.47 79.389 0.557 11.74 0
as a comparative metric between videos. The reported values in 8 0.999 50.09 98.339 0.227 8.75 0
the Table III are achieved from the average of first 64 frames in 24 0.995 42.22 94.905 0.234 9.07 0
FourPeople
the selected test videos, however In Fig. 6, we can see the SSIM, 40 0.974 34.71 75.771 0.266 9.93 0.005
PSNR, and VMAF results for each of first 64 frames. When 8 0.999 50.18 99.958 0.292 12.95 1.342
only encrypting the MVD elements (signs and values for 24 0.994 38.81 99.875 0.321 12.06 1.618
BasketballD
horizontal and vertical directions), the I frames would not have rive 40 0.949 33.08 65.037 0.340 11.39 1.826
enough visual disturbance (e.g., frame 32 in Fig. 6). However, 8 0.999 50.25 99.959 0.085 9.90 6.658
Encryption of Luma components results in a steady highly PeopleOnSt
reet 24 0.998 38.86 99.864 0.096 10.02 7.033
deteriorated output for all frames. Moreover, because the 40 0.977 29.48 62.668 0.099 9.79 6.973
residual signs are not highly important in hiding video 8 0.999 49.85 99.959 0.11 10.09 3.899
information, only encrypting them will not yield acceptable Traffic 24 0.998 40.65 95.48 0.126 9.28 3.815
disturbance in the encrypted videos. Because our work is the 40 0.982 32.69 67.681 0.135 9.16 3.638
first one that looks at selective encryption in the H.266/VVC, 8 0.999 50.35 99.956 0.314 10.57 0
RaceNight
we only compare the results with other similar methods that are 24 0.997 37.37 96.456 0.355 7.54 0
used in H.265/HEVC and H.264/AVC compression standards. 40 0.983 34.60 69.107 0.531 14.97 0
In the Table. IV, we see a comparison with some of other 8 0.999 50.26 97.536 0.212 10.14 8.888
HoneyBee
recent papers for SE algorithms in H.265/HEVC. From this 24 0.997 39.17 83.673 0.241 12.41 6.354
comparison, we see the higher disturbance that is caused in the 40 0.991 37.67 65.464 0.259 9.72 6.445
TABLE IV. COMPARISON OF EXPERIMENTAL RESULTS WITH OTHER PROPOSED SELECTIVE ENCRYPTION ALGORITHMS.
SSIM PSNR
Video Sequence QP
Boyadjis State of the art (for Presented Boyadjis (for State of the art Presented
(for HEVC) HEVC) [17] – Enc Scheme (for HEVC) [6] (for HEVC) [17] - Enc Scheme (for
[6] VVC) VVC)
8 0.070 0.058 0.078 10.81 10.89 9.23
Mobile 24 0.077 0.076 0.075 10.64 10.53 8.82
40 0.110 0.097 0.072 11.46 10.59 9.03
8 0.320 0.260 0.362 15.21 14.89 15.21
BasketballPass 24 0.408 0.372 0.419 15.48 15.40 15.51
40 0.457 0.459 0.43 16.39 16.94 14.17
8 0.238 0.215 0.201 13.96 14.56 10.62
BQMall 24 0.301 0.282 0.216 14.32 14.54 10.81
40 0.332 0.312 0.227 14.82 14.40 9.8
8 0.468 0.449 0.397 13.38 13.87 8.60
Johnny 24 0.569 0.552 0.476 13.77 13.68 9.51
40 0.592 0.580 0.557 13.41 13.40 11.746
8 0.325 0.295 0.227 12.76 13.13 8.75
FourPeople 24 0.402 0.381 0.234 13.61 13.55 9.07
40 0.420 0.389 0.266 13.07 12.83 9.93
8 0.492 0.496 0.321 14.65 15.21 12.955
BasketballDrive 24 0.545 0.509 0.321 14.72 15.00 12.068
40 0.582 0.560 0.340 15.49 15.26 11.391
8 0.250 0.232 0.085 12.93 13.12 9.90
PeopleOnStreet 24 0.294 0.262 0.096 13.23 13.10 10.018
40 0.332 0.312 0.099 13.09 13.23 9.79
8 0.260 0.240 0.11 12.31 12.53 10.09
Traffic 24 0.348 0.328 0.126 12.81 12.70 9.28
40 0.372 0.361 0.135 13.52 13.34 9.16
7
SSIM Score
0.6
video encoder but here we are encrypting only some of key 0.5
our scheme for the selective encryption, the quantitative results 0.3
should even become better. Also, we should mention that the 0.2
0.1
computation cost depends on how many syntax elements and
0.0
how many units are selected for the encryption. the ideal case 0 10 20 30 40 50 60
Frame Number
would be when the video disturbance is high, i.e., high security (a)
of video, while the computation cost remains low.
50
Edges of encrypted videos play an important factor for the 45
Original
Residual Signs
MVD Values and Signs
visual security of encrypted video using selective encryption 40
Luma IPM
All
PSNR
about the objects in frames. In Fig. 7 you can see some of 25
20
sample frames in both original (row one) and encrypted forms
15
(row two). We can observe that an adversary cannot get any 10
important information by considering the edges in the 5
∑𝑀 ̅
𝑚,𝑛=1| 𝑃(𝑚, 𝑛 ) − 𝑃(𝑚, 𝑛)|
90
Residual Signs
MVD Values and Signs
70
VMAF Score
60
edge pixel in the original video frame, and 𝑃̅ (𝑚, 𝑛) is the value 40
30
of edge pixel in the encrypted video. In Table V, the average of 20
EDR values for first 64 frames are listed for different QP values. 10
This is due to the fact that while using large QP values, many value, the number of syntax elements in the presented method
of edges strength will change which results in a higher EDR is 10 to 20 times larger compared to selective encryption
value. methods in previous standards. This is probably one of the
reasons that selective encryption in H.266/VCC provides higher
B. Bit Rate Change
visual security (as we discussed earlier). Moreover from Table
Because SE methods are integrated into the compression VII, we notice a sharp change in encryption space as QP values
algorithms, many of the coefficients and values will be changes. The higher QP value results in higher compression and
changed, resulting in a different probability model for the lower output quality, which causes to have less syntax elements
syntax elements. This will increase the bit rate encoding being encrypted (as more coefficients become zero).
(lowering the encoding efficiency). Number of encrypted
elements and selected syntax elements directly affect the bit rate
change. In Table VI, the bit rate change is reported for different TABLE VII. Comparison of Encryption Space for one I and three B
frames.
video bitstreams and is compared with other important related
Selective Encryption Algorithms
works. The results show that the bit rate increase remains Video QP Boyadjis State of the art Presented
around the same value as was reported for other works in Sequence (for HEVC) (for HEVC) Scheme
H.265/HEVC, however we the presented scheme for the [6] [17] – Enc (for VVC)
H.266/VVC results in more visual deterioration. 8 408280 420276 2970442
Mobile 24 91249 98087 1075159
40 15129 16234 285866
8 156381 162952 1747539
TABLE V. AVERAGE EDR SCORES OF ENCRYPTED AND BasketballPass 24 25586 27982 525422
40 2694 3207 69143
ORIGINAL FRAMES
8 1335708 1369545 12605551
EDR BQMall 24 123706 133542 1887324
Video QP = 8 QP = 24 QP = 40 40 15913 17993 341932
Sequence Org. Enc. Org. Enc. Org. Enc. 8 1529922 1576207 16440069
Mobile 0.035 0.952 0.135 0.98 0.353 0.992 Johnny 24 67579 72493 1502727
40 9478 10669 188802
BasketballPass 0.108 0.91 0.255 0.928 0.433 0.968 8 1581943 1639071 15812535
BQMall 0.082 0.911 0.266 0.944 0.531 0.968 FourPeople 24 109661 115938 1947921
40 19176 20552 371282
Johnny 0.239 0.911 0.357 0.964 0.449 0.967
8 5747621 5842386 74431080
FourPeople 0.194 0.931 0.35 0.946 0.535 0.938 BasketballDrive 24 248473 268004 6289156
BasketballDrive 0.141 0.781 0.985 0.452 0.986 40 25455 30993 428516
0.296 8 1.05x107 1.08x107 111624820
PeopleOnStreet 0.192 0.946 0.316 0.98 0.557 0.991 PeopleOnStreet 24 1144257 1276675 19361562
Traffic 0.195 0.957 0.396 0.987 0.444 0.983 40 136658 169194 2799887
8 9217222 9523708 100424218
Traffic 24 782504 844394 14478533
40 103101 110918 1957091