Selective Encryption of VVC Encoded Video Streams For The Internet of Video Things

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

1

Selective Encryption of VVC Encoded Video


Streams for the Internet of Video Things
Amir Fotovvat and Khan A. Wahid, Senior Member, IEEE

Abstract— Visual sensors serve as a critical component of the videos would require significant considerations [1].
Internet of Things (IoT). There is an ever-increasing demand for Considering the growing demand for IoVT and video
broad applications and higher resolutions of videos and cameras streaming, the amount of video data on the Internet is rapidly
in smart homes and smart cities, such as in security cameras. To increasing. In [2], It is predicted that by 2030, 13 billion
utilize this large volume of video data generated from networks of
visual sensors for various machine vision applications, it needs to cameras will be around the world from which a huge amount of
be compressed and securely transmitted over the Internet. data will be generated. Cisco predicts that in 2022, 82% of IP
H.266/VVC, as the new compression standard, brings the highest traffic will be consumed by video content [3]. Another report
compression for visual data. To provide security along with high indicates a significant demand for video content in future smart
compression, a selective encryption method for hiding information homes [4]; Fig. 1 shows the estimated bandwidth requirement
of videos is presented for this new compression standard. Selective from the results of this report (applications sorted based on their
encryption methods can lower the computation overhead of the
encryption while keeping the video bitstream format which is predicted appearance time in future, the upper it is the earlier
useful when the video goes into untrusted blocks such as appearance time would be). With the upcoming video
transcoding or watermarking. Syntax elements that represent technologies and the demand for higher bandwidth, it is
considerable information are selected for the encryption, i.e., luma necessary to provide efficient solutions for privacy, storage, and
Intra Prediction Modes (IPMs), Motion Vector Difference (MVD), transmission of video data. There are two general methods to
and residual signs., then the results of the proposed method are encrypt a video prior to transmission, naive encryption and
investigated in terms of visual security and bit rate change. Our
experiments show that the encrypted videos provide higher visual selective encryption. Naive encryption is defined where the
security compared to other similar works in previous standards, whole video bitstream is encrypted. Using a secure encryption
and integration of the presented encryption scheme into the VVC algorithm, this type of video encryption would completely hide
encoder has little impact on the bit rate efficiency (results in 2% any information within the video. On the contrary, selective
to 3% bit rate increase). encryption is when only some of video elements are being
encrypted. Fig. 2 shows an overview of a simplified IoT
Index Terms— Selective Encryption, H.266/VVC, IoVT, environment with visual sensors. Depending on the application
Security, Video encryption and different scenarios, some of the computations, such as
encryption and compression, can be processed at the edge.
I. INTRODUCTION Then, the video data can be transmitted to the cloud for other

I N the coming years, the Internet of Things (IoT) will become


a key technology in the world. The purpose of IoT would be
to connect a variety of end nodes and devices to the Internet in
processing, such as machine vision applications and storage.
So, while a video is being compressed at the edge, some of the
most important elements of the encoder can be encrypted to
order to exchange data and to help many tasks become hide the information of the video, then the encrypted video can
automated. New advances in IoT will bring many opportunities be transmitted to the cloud servers.
along with many challenges. Internet of Video Things (IoVT) Naive encryption is preferred in applications where complete
is an essential subset of the IoT which can be defined as the confidentiality of videos is the top priority since it hides all the
internetworking of visual sensors [1]. Visual sensors generate video information. Selective encryption can lower computation
versatile and richer data; with the advances in machine vision complexity and also is a suitable choice for situations where the
and artificial intelligence, cameras are becoming a favorite video stream is going into other untrusted blocks (for
component in IoT systems. For example, automated retail stores applications such as transcoding, watermarking, and cutting)
such as Amazon Go are employing cameras and sensors instead while being transmitted through the network [5]. This is due to
of cashiers and workers for giving services to the shoppers. the fact that, unlike naive encryption, videos encrypted by
Other broad areas where IoVT can be applied to are home selective encryption approaches are decodable and can be
security, video surveillance, and smart city. Compared to other treated like a normal video stream [6]. These features would
IoT sensors and devices, ubiquitous application of visual make selective encryption a suitable choice for Digital Right
sensors will lead to more challenges since the storage, Management (DRM) services as well [7]. Such SE algorithms
computation, transmission, and privacy of large volumes of can provide a sufficient protection level with relatively small

The Authors are with the Department of Electrical and Computer


Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada (e-
mail: a.fotovvat@usask.ca; khan.wahid@usask.ca).
2

50% more compression rate than the H.265/HEVC [8], thus it


Cameras Sec rity can play an important role to reduce the bandwidth
Streaming consumption or storage requirement of video contents. In the
V Streaming
following sections of the paper, first we will provide an
Self riving Vehicle iagnostics
overview of the related works in part one of section two, then
Clo d aming
in part two VVC coding and some of its new algorithms and
I Video
improvements over HEVC will be discussed. In section three,
all TV
the methodology of tests along with discussions on the
presented selective encryption scheme are presented. Finally,
V
section four covers the discussions of the experimental results.
V

II. ENCRYPTION OF ENCODED VIDEO STREAMS


Fig. 1. Bandwidth demand for future connected homes (Mbps).
A. Overview of Related Works
Among the related literature, there are a variety of proposed
methods that discuss how to securely hide video content
information. The encryption process can be performed after
video encoding where we encrypt some or all of the encoded
bitstream, it can be integrated inside the compression algorithm
(selective encryption), or the encryption can be performed by
Vis al Sensors Edge Comp ting Clo d Comp ting
scrambling the pixels with chaos-based algorithms. Thus, the
et or ing
performance and characteristics of these methods essentially
Fig. 2. IoT environment with visual sensors. depend on the domain we apply the encryption to [6]. In [9], for
the encryption of MPEG-1 bitstreams, it is proposed to encrypt
computation cost and delay. The level of output visual the complete bitstream or important parts such as headers
deterioration in SE algorithm is completely dependent to the depending on the desired security level. Since this is considered
number and type of syntax elements that are selected for the naive encryption, the format compliance will not be maintained.
encryption. Thus, to hide video information using this methods, Muhammad et al. [10] present a method for encryption of visual
most important elements of the encoder that carry significant data in IoT systems. They use probabilistic algorithms to detect
information of the video such as Motion Vector Differences frames with a required level of abnormality, then pixels of these
(MVD), MVD signs, and Intra Prediction Modes (IPM) are keyframes are encrypted using chaotic maps and pseudorandom
usually encrypted. The encryption of syntax elements in the number generators (PRNGs). Based on our experiments, we
encoder should not cause the decoder to behave strangely, or in observed that depending on the number of encrypted frames,
other words the encryption should not change the format chaotic encryption algorithms can become quite expensive in
compliancy of the video bitstream so that it can be decoded by terms of computation cost because of the processes used for the
the receiver with no problem. Thus, for the selection of syntax scrambling of pixels. In another paper, Preishuber et al. also
elements in the selective encryption methods, format show some other problems with chaos-based encryption
compliancy of encoder and decoder is of high importance. This algorithms [11]. Yang et al. [12], propose discrete sine and
paper proposes a selective encryption algorithm for the cosine transforms for the purpose of video encryption in the
H.264/AVC. In [13], residual component, intra and inter
H.266/VVC which is the latest video compression standard.
prediction modes, and MVD components are used for the
This next generation of compression standard provides up to
encryption in H.264/AVC standard. Wallendael et al.[14]
TABLE I. SOME OF RELATED WORKS FOR VIDEO ENCRYPTION
Video Encryptin Compression Cipher Elements Used for Encryption Bitrate Format
Scheme Standard Increase Compliancy

[6] H.264/AVC - AES- Luma IPM, MVD sign and values, MV reference idx, Merge idx, MVP Yes Yes
H.265/HEVC CTR idx, Residual sign and values, SAO filter
[9] MPEG-1 DES- Headers, DCT coefficients, or whole bitstream (based on security No No
CBC levels)
[10] N/A PRNG Pixels No Yes

[14] H.265/HEVC AES MVD sign and values, Residual sign and values, delta QP, reference Yes Yes
pic idx, Merge idx, MVP idx, SAO parameter
[15] H.265/HEVC AES- Luma and chroma IPM, QTC, MVD signs and values Yes Yes
CTR
[16] H.265/HEVC RC4 QTC, MVD sign and values, IPM luma Yes Yes

[17] H.265/HEVC AES- Chroma and Luma IPM, Residual Sign and values, MVD sign and Yes Yes
CTR values, Merge Idx, MVP idx, Reference frame idx, SAO parameter
3

investigated the encryption of syntax elements in the HEVC Selective Encryption

encoder, such as MVD values, delta QP, and residual anti ation Synta
Inp t Video
Transformation Encryption
components. They also discussed the effect of encryption on the
encoder bit rate and visual scrambling of output frames. In [6], Inv
CA AC
anti ation
the authors presented an SE algorithm for The Context-Aware Inv
Transformation
Binary Arithmetic Coding (CABAC) in the H.264/AVC and Intra
rediction
inari ation

H.265/HEVC standards. They consider encryption of a wider


range of elements in the aforementioned standards, including otion
Compensation
Entropy
Coding
In oop ilter
Luma IPM components. The presented experimental results of Estimation

their work show the impact of the selective encryption on the ecoded ict re Video
encoder bit rate and visual distortion of video frames. ffer Stream

Thiyagarajan et al. [15] proposed a more efficient SE algorithm


for the Internet of multimedia things. They used a method to Fig. 3. Block diagram of VTM software with the selective encryption
estimate the energy level of frames, then based on that energy block added before CABAC.
level the algorithm decides what syntax elements of the
H.265/HEVC standard be used for the encryption. Although it
should be noted that the estimation of texture and motion
energy levels in each frame is adding extra overhead, thus the
real efficiency of the proposed system will get lower.
Obviously, because their method is not encrypting all of the (b)
syntax elements in the encryption process, the reported metrics
used to compare the visual distortion indicate the visual
deterioration to be lower compared to other methods. Xu [16],
proposes a similar encryption approach using MVD, IPM, and (c)
Quantized Transform Coefficient (QTC) elements along with a
method for data embedding using QTC values in the
H.265/HEVC standard. Authors of [17] tried to encrypt almost
(a) (d)
all of the important syntax elements of the H.265/HEVC
standard and presented their work as a tunable approach. Also, Fig. 4. (a) shows a sample block partitioning in VVC. (b) is quadtree
another scrambling method is used to further distort the edges splitting. (c) is vertical and horizontal binary mutli-type tree. (d) is
since regular SE algorithms might not completely hide edge vertical and horizontal ternary mutli-type tree.
regions; off course this added scrambling process will increase
the computation complexity of the proposed method. A thors’ over the previously implemented technologies. In terms of
results also indicate that the bitrate increase would be between performance, Laude et al. [20] conducted a comprehensive
2% to 10% depending on whether TU coefficients are performance comparison of VTM, AV1, HM (HEVC
scrambled or not. Also, in Table I, a summary of some of software), x264, and x265 video coding software. Their results
related works on selective encryption are provided. In this show that on average, VTM gives 5% better BD-rate
paper, we are addressing the selective encryption in the performance compared to AV1. For the 4K videos, VTM
H.266/VVC to see how the new improvements in this new outperforms AV1 with 20% gain. However, the VTM encoding
compression algorithm affects the SE methods. Since no other time is around two to three times of AV1 depending on the
work is available for SE algorithms on H.266/VVC, we will be tested video sequences. However, we should mention that since
comparing the results with previous similar works where the H.266/VVC is announced very recently, there will be more
authors were using previous compression standards. optimizations over its features and implementations in the near
B. H.266/VVC Video Compression Standard future. H.266/VVC is a hybrid codec and very similar to
previous standards, thus, we only discuss some of the new
Since 2015, there have been many efforts and activities to
features and optimizations of the H.266/VVC over
develop a new compression standard. It was until 2020 that the
H.265/HEVC that will affect the process of selective
Joint Video Exploration Team (JVET) of the ITU and ISO/IEC
encryption.
Moving Pictures Expert Group (MPEG) announced the
finalization of the H.266/VVC (otherwise known as ISO/IEC During the process of compression, similar to H.265/HEVC,
each frame is first split into smaller units which are named
23090-3). This new compression standard provides up to 50%
Coding Tree Units (CTUs). Size of each CTU in the
more compression rate compared to the H.265/HEVC standard
H.265/HEVC standard is 64x64 pixels, but in the H.266/VVC
and supports video resolutions from 4K to 16K in addition to
its dimension is increased to 128x128 pixels. Then, a sequence
360-degree videos. The official reference software of
of CTUs is grouped together in rectangular shapes to form tiles.
H.266/VVC standard is VVC Test Model (VTM) [18].
However, recently a faster and more optimized software named Also, in VVC two modes of slicing are considered, rectangular
slice mode and raster scan slice mode. After dividing frames
VVenC was released [19]. The new standard is in fact very
into CTUs, each CTU is further divided into Coding Units
similar to the HEVC since it is still a hybrid video codec (see in
(CUs). In the H.265/HEVC, during prediction and
Fig. 3). However, H.266/VVC is powered with many new
coding tools along with lots of improvements and refinements
4

transformation CUs will also be respectively divided into TABLE II. VIDEO SEQUENCES USED FOR EXPERIMENTS
smaller Prediction Units (PUs) and Transformation Units Video Resolution Frame Rate Bit depth
(TUs). However, concept of separating CU, TU, and PU are no Mobile 352x288 24 8
longer in use (except for CUs with a larger size than the BasketballPass 416x240 50 8
maximum transform length), in fact VVC uses CUs as the basic BQMall 832x480 60 8
processing units. Moreover, for the purpose of splitting CTUs, Johnny 1280x720 60 8
two types of hierarchical trees are used: quadtree (QT) and FourPeople 1280x720 60 8
multi-type tree (MTT). First, CTUs are divided with a BasketballDrive 1920x1080 50 8
quaternary tree then, if needed, the leaf nodes are further PeopleOnStreet 2560x1600 30 8
partitioned with multi-type tree (multi-type splitting types are Traffic 2560x1600 30 8
shows in Fig. 4). As a result, the VVC encoder has a wide range RaceNight 3840x2160 50 10
of rectangle blocks and shapes for a CTU to split into, which HoneyBee 3840x2160 120 10
would bring higher compression (as well as higher computation
cost). Also, the tiles, slices, and subpictures are separated in the VVenC only has the encoder, and the decoder side is not
bitstream, which is suitable for parallel processing in encoder available yet. Thus, for our simplicity during the tests we used
and decoder. This technique would also be helpful in the 360- the VVenC for the encoding and VTM for decoding video
degree videos since it allows the receiver to only decode the streams. We have used the randomaccess_faster configuration
regions of video that the user is seeing. of the VVenC software which has GOPsize of 32 and
For Intra prediction, there are 67 Intra modes in the intraPeriod of 32. In this case there will be one I frame with 31
H.266/VVC which make the prediction more accurate following B frames. The base quantization parameter (QP) is
compared to H.265/HEVC standard which has 35 modes. Two also selected to vary from values 8, 24, and 32 depending on the
of the 67 modes are planar and DC modes, the rest belong to the performed tests. The level value which determines the tire that
angular predictions. In the new coding standard, the intra the encoded bitstream compiles to is chosen from Table 140 and
prediction is similar to H.265/HEVC. Depending on the upper Table 141 of [22] for each corresponding test video. Its value is
and left blocks of the current coding block, a list of Most dependent to video resolution (luma height and width) and
Probable Modes (MPMs) with size of six, will be constructed. frame rate of the video under test.
Then, if the current prediction mode is one of MPM elements,
the encoder binarizes its index using truncated unary coding, B. Encryption of Syntax Elements
otherwise index of the remaining 61 modes will be binarized. The encryption is applied prior to the binarization of syntax
Because the partitioning blocks in H.266/VVC are not elements (as shown in Fig. 3). Similar to HEVC, Context-
necessarily squares, some of prediction angles have become Aware Binary Arithmetic Coding (CABAC) in VVC has two
wider than usual 45 degrees to -135 degrees. This would allow regular and bypass modes for encoding. In bypass mode, all
using more reference pixels for the prediction. For Inter symbols are considered as equiprobable for the encoding. For
Prediction, VVC has several new tools. One of the interesting Regular mode, a probability model is determined using context
new features is affine motion compensation. In previous codecs of elements and the encoding is performed based on this
like H.265/HEVC, the movement of objects are only in 2D probability model. Thus, encryption of syntax elements
dimensional directions. However, in real videos we barely see encoded with bypass mode would be more suitable since it does
only planar motions and movements of objects between video not affect the probability model, which would lower the
frames might accompany rotation or scaling (zooming in or encoding efficiency. For the encryption of elements that are
out). Thus, affine motion compensation is introduced to encoded in regular mode, we should also make sure that their
represent these more complex motions. There are a couple of probability model is also changed accordingly. In the proposed
other newly introduced tools for inter prediction among which method, the key syntax elements that are considered for the
we can mention Merge Mode with MVD (MMVD), Symmetric encryption in the H.266/VVC algorithm are luma IPM value,
MVD Coding, Adaptive Motion Vector Resolution (AMVR), horizontal and vertical values/signs of motion vectors, and signs
etc. For more information and details of these feature readers of residual values. For the encryption process, a binary stream
can refer to VTM algorithm descriptions [21]. needs to be generated using an encryption algorithm, and then
the XOR operation is made between the stream and syntax
III. VVC SELECTIVE ENCRYPTION elements. For this purpose, it is recommended to use the
Advanced Encryption Standard (AES) in CTR mode of
A. Configuration, Software, and Test Video Sequences
operation for the encryption and decryption. The x and y
A wide range of video sequences, from low resolution to 4K position values of each unit can be used for the Initialization
videos with different frame rates and bit depths, are selected for Vector (IV) of the encryption algorithm (e.g., AES-CTR),
our tests in order to investigate results of the proposed SE which will maintain the security of key during the encryption
algorithm (selected test videos are shown in Table II). The of all syntax elements. Since some of the syntax elements have
software for H.266/VVC is VTM [18]; currently, its latest a specific range (e.g., luma most probable modes are from 0 to
version is 10.2. There is another software for VVC coding 5), the XOR operation should be performed carefully so that the
named VVenC [19], which is actually much faster compared to final value is not outside of the available range of the
VTM. We can look at VTM as the complete software with all corresponding syntax element. Without such considerations,
of the tools and options, while VVenC provides a faster because of format compliancy of the video bitstream, the
implementation of H.266/VVC. However, unfortunately, the decoder might face some problems. Thus, we choose a desirable
5

number of last bits in the binary stream generated from the IV. EXPERIMENTAL RESULTS
AES-CTR and then we do the XOR operation if its value after
A. Visual Security
XOR is not beyond the range of available values. For the
encryption of luma IPM modes, the encoder either encodes the The encryption of selected syntax elements results in highly
index of Most Probable Mode (MPM) (predicted from the distorted videos which indicates that the details and information
neighboring units) or it encodes the index of 61 remaining of videos are securely hidden. Fig. 5 shows the visual
modes for the binarization (when the mode is not among the disturbance caused by the encryption of each syntax element.
MPM modes). Encoding of the MPM uses bypass mode and Based on these sample frames, we observe the importance of
requires five bits since it uses truncated unary coding for luma IPM modes in the H.266/VVC videos since their
binarization. For the remaining IPM modes, the six bits fixed encryption causes most of deterioration in the output video (as
length coding is used and encoding is done in bypass mode. can be seen in the column four of the Fig. 5). After luma modes,
Thus, the encryption is applied to luma modes according to the MVD modes are the next key syntax elements that their
selected mode for the luma intra prediction and their respective encryption is causing highest deterioration in final videos
binarization. For the format compliancy, when MPM modes are (column three Fig. 5). However, residual signs have a smaller
being used, the encrypted syntax should not be greater than 5. impact on the output video as can be seen from column two of
Also, when the prediction mode is not among the MPM modes, Fig. 5. The quantitative performance of the videos are compared
the encrypted remaining IPM mode has a maximum range of 61 using PSNR, SSIM [23], and VMAF [24], [25]. SSIM shows
because six MPM modes are removed from all 67 available the structural similarity index which compares the processed
luma modes. Motion vectors constitute of coefficients and images (the encrypted video frames for our application) and the
corresponding signs for horizontal and vertical directions. The original images (when the video is encoded with VVC but is
absolute values are encoded using two flags in regular mode, not encrypted). To measure the performance of an encryption
(abs_mvd_greater_0 and abs_mvd_greater_1) and algorithm, the lower the SSIM the better the performance of the
abs_mvd_minus_2 which is bypass coded with golmb-rice algorithm would be. PSNR is another metric which is not very
binarization. We only have chosen to encrypt the good four our application, however because of its popularity we
abs_mvd_minus_2 and corresponding signs since they are have employed this metric as well. Video Multimedia
encoded in bypass mode. Another syntax element selected for Assessment Fusion (VMAF) is a relatively new metric used for
encryption is signPattern which represents signs of residual video quality measurement which is developed by Netflix. The
coefficients and is encoded in bypass mode. VMAF returns a score between 0 to 100, and a higher score

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)

(k) (l) (m) (n) (o)

(p) (p) (r) (s) (t)


Fig. 5. Visual results of experimental tests. First column shows the original frames, second column is when chroma modes are encrypted, third column is
when luma modes are encrypted, forth column is when MVD values and signs are encrypted, and fifth column is when all previous elements are encrypted
together.

SD
6

shows more resemblance between the two given original and TABLE III – PERFORMANCE OF THE SCHEME IN TERMS OF
processed videos. In table III, the performance results from the SSIM, PSNR AND VMAF FOR EACH VIDEO SEQUENCE.
mentioned metrics are provided for selected test videos while Original Encrypted
Video QP
using the encryption. We can see that the SSIM values for Sequence SSIM PSNR VMAF SSIM PSNR VMAF
8 0.997 49.81 99.959 0.078 9.235 6.642
encrypted videos are much lower compared to the original
24 0.98 35.78 99.234 0.075 8.819 6.190
values. The PSNR values also confirm the effect of encryption Mobile
40 0.876 26.25 75.375 0.072 9.027 6.068
on visual deterioration of encrypted videos. In some cases, we
8 0.995 50.52 99.339 0.362 13.21 2.877
observe mediocre PSNR scores even though the SSIM is very BasketballP
ass 24 0.972 40.48 94.397 0.419 15.01 2.254
low and video quality is mostly disturbed. This shows the fact
40 0.834 30.51 52.556 0.43 10.17 2.2
that PSNR might not be a very good metric here. Also, the
8 0.999 49.88 99.959 0.201 10.62 5.015
reason that the VMAF scores for some original videos are not
BQMall 24 0.989 38.47 98.745 0.216 10.81 5.457
as high as it should be (close to 100) is that we have not used
40 0.912 29.39 66.379 0.227 9.8 6.445
the 4K models. VMAF is in fact a metric that is trained with
8 0.998 50.12 98.248 0.397 8.596 0
resolutions up to 1080p, as a result we see mediocre scores for 24 0.992 42.34 95.258 0.476 9.510 0
Johnny
some of original videos. However, these scores can still be used 40 0.976 36.47 79.389 0.557 11.74 0
as a comparative metric between videos. The reported values in 8 0.999 50.09 98.339 0.227 8.75 0
the Table III are achieved from the average of first 64 frames in 24 0.995 42.22 94.905 0.234 9.07 0
FourPeople
the selected test videos, however In Fig. 6, we can see the SSIM, 40 0.974 34.71 75.771 0.266 9.93 0.005
PSNR, and VMAF results for each of first 64 frames. When 8 0.999 50.18 99.958 0.292 12.95 1.342
only encrypting the MVD elements (signs and values for 24 0.994 38.81 99.875 0.321 12.06 1.618
BasketballD
horizontal and vertical directions), the I frames would not have rive 40 0.949 33.08 65.037 0.340 11.39 1.826
enough visual disturbance (e.g., frame 32 in Fig. 6). However, 8 0.999 50.25 99.959 0.085 9.90 6.658
Encryption of Luma components results in a steady highly PeopleOnSt
reet 24 0.998 38.86 99.864 0.096 10.02 7.033
deteriorated output for all frames. Moreover, because the 40 0.977 29.48 62.668 0.099 9.79 6.973
residual signs are not highly important in hiding video 8 0.999 49.85 99.959 0.11 10.09 3.899
information, only encrypting them will not yield acceptable Traffic 24 0.998 40.65 95.48 0.126 9.28 3.815
disturbance in the encrypted videos. Because our work is the 40 0.982 32.69 67.681 0.135 9.16 3.638
first one that looks at selective encryption in the H.266/VVC, 8 0.999 50.35 99.956 0.314 10.57 0
RaceNight
we only compare the results with other similar methods that are 24 0.997 37.37 96.456 0.355 7.54 0
used in H.265/HEVC and H.264/AVC compression standards. 40 0.983 34.60 69.107 0.531 14.97 0
In the Table. IV, we see a comparison with some of other 8 0.999 50.26 97.536 0.212 10.14 8.888
HoneyBee
recent papers for SE algorithms in H.265/HEVC. From this 24 0.997 39.17 83.673 0.241 12.41 6.354

comparison, we see the higher disturbance that is caused in the 40 0.991 37.67 65.464 0.259 9.72 6.445

VVC encoded videos by presented selective encryption

TABLE IV. COMPARISON OF EXPERIMENTAL RESULTS WITH OTHER PROPOSED SELECTIVE ENCRYPTION ALGORITHMS.
SSIM PSNR
Video Sequence QP
Boyadjis State of the art (for Presented Boyadjis (for State of the art Presented
(for HEVC) HEVC) [17] – Enc Scheme (for HEVC) [6] (for HEVC) [17] - Enc Scheme (for
[6] VVC) VVC)
8 0.070 0.058 0.078 10.81 10.89 9.23
Mobile 24 0.077 0.076 0.075 10.64 10.53 8.82
40 0.110 0.097 0.072 11.46 10.59 9.03
8 0.320 0.260 0.362 15.21 14.89 15.21
BasketballPass 24 0.408 0.372 0.419 15.48 15.40 15.51
40 0.457 0.459 0.43 16.39 16.94 14.17
8 0.238 0.215 0.201 13.96 14.56 10.62
BQMall 24 0.301 0.282 0.216 14.32 14.54 10.81
40 0.332 0.312 0.227 14.82 14.40 9.8
8 0.468 0.449 0.397 13.38 13.87 8.60
Johnny 24 0.569 0.552 0.476 13.77 13.68 9.51
40 0.592 0.580 0.557 13.41 13.40 11.746
8 0.325 0.295 0.227 12.76 13.13 8.75
FourPeople 24 0.402 0.381 0.234 13.61 13.55 9.07
40 0.420 0.389 0.266 13.07 12.83 9.93
8 0.492 0.496 0.321 14.65 15.21 12.955
BasketballDrive 24 0.545 0.509 0.321 14.72 15.00 12.068
40 0.582 0.560 0.340 15.49 15.26 11.391
8 0.250 0.232 0.085 12.93 13.12 9.90
PeopleOnStreet 24 0.294 0.262 0.096 13.23 13.10 10.018
40 0.332 0.312 0.099 13.09 13.23 9.79
8 0.260 0.240 0.11 12.31 12.53 10.09
Traffic 24 0.348 0.328 0.126 12.81 12.70 9.28
40 0.372 0.361 0.135 13.52 13.34 9.16
7

algorithms. In most of the cases we notice the performance of 1.0

selective encryption to be higher compared with state of the art 0.9


Original
Residual Signs
0.8
works in HEVC standard. In [17], authors select almost all of MVD Values and Signs
Luma IPM
0.7 All
key syntax elements for the encryption in the H.265/HEVC

SSIM Score
0.6
video encoder but here we are encrypting only some of key 0.5

syntax elements. If wider range of elements were considered in 0.4

our scheme for the selective encryption, the quantitative results 0.3

should even become better. Also, we should mention that the 0.2

0.1
computation cost depends on how many syntax elements and
0.0
how many units are selected for the encryption. the ideal case 0 10 20 30 40 50 60
Frame Number
would be when the video disturbance is high, i.e., high security (a)
of video, while the computation cost remains low.
50
Edges of encrypted videos play an important factor for the 45
Original
Residual Signs
MVD Values and Signs
visual security of encrypted video using selective encryption 40
Luma IPM
All

methods. If the selective encryption scheme is not good enough 35

edges of the encrypted videos might leak some information 30

PSNR
about the objects in frames. In Fig. 7 you can see some of 25

20
sample frames in both original (row one) and encrypted forms
15
(row two). We can observe that an adversary cannot get any 10
important information by considering the edges in the 5

encrypted video. To validate the security of edges in the 0


0 10 20 30 40 50 60
proposed method, we have considered the Edge Differential Frame Number
Ratio (EDR) as was employed in [5]. The EDR is defined as: (b)
100
Original

∑𝑀 ̅
𝑚,𝑛=1| 𝑃(𝑚, 𝑛 ) − 𝑃(𝑚, 𝑛)|
90
Residual Signs
MVD Values and Signs

𝐸𝐷𝑅 = 𝑀 (1) Luma IPM

∑𝑚,𝑛=1| 𝑃(𝑚, 𝑛 ) + 𝑃̅(𝑚, 𝑛)|


80 All

70
VMAF Score
60

Where 𝑀 is the number of edge pixels, P(m,n) is the value of 50

edge pixel in the original video frame, and 𝑃̅ (𝑚, 𝑛) is the value 40

30
of edge pixel in the encrypted video. In Table V, the average of 20
EDR values for first 64 frames are listed for different QP values. 10

Reported values confirm that the similarity of edges in 0


0 10 20 30 40 50 60
encrypted and original frames are very little (low EDR indicate Frame Number
high edge similarity between given images). Note that this (c)
metric is completely dependent to the implemented edge Fig. 6. Visual quality results after encrypting selected syntax elements for first 64 frames in
detection algorithm. Also, from table V, we see that for QP=40, BasketballDrive video sequence.

EDR values in original values is relatively high, however it does


not indicate that the two compared videos are very different.

(a) (b) (c)

(d) (e) (f)


Fig. 7. Edges shows in some of video test sequences after and before performing video encryption. Images in first row are the original frames and the second
row shows the corresponding encrypted frames.
8

This is due to the fact that while using large QP values, many value, the number of syntax elements in the presented method
of edges strength will change which results in a higher EDR is 10 to 20 times larger compared to selective encryption
value. methods in previous standards. This is probably one of the
reasons that selective encryption in H.266/VCC provides higher
B. Bit Rate Change
visual security (as we discussed earlier). Moreover from Table
Because SE methods are integrated into the compression VII, we notice a sharp change in encryption space as QP values
algorithms, many of the coefficients and values will be changes. The higher QP value results in higher compression and
changed, resulting in a different probability model for the lower output quality, which causes to have less syntax elements
syntax elements. This will increase the bit rate encoding being encrypted (as more coefficients become zero).
(lowering the encoding efficiency). Number of encrypted
elements and selected syntax elements directly affect the bit rate
change. In Table VI, the bit rate change is reported for different TABLE VII. Comparison of Encryption Space for one I and three B
frames.
video bitstreams and is compared with other important related
Selective Encryption Algorithms
works. The results show that the bit rate increase remains Video QP Boyadjis State of the art Presented
around the same value as was reported for other works in Sequence (for HEVC) (for HEVC) Scheme
H.265/HEVC, however we the presented scheme for the [6] [17] – Enc (for VVC)
H.266/VVC results in more visual deterioration. 8 408280 420276 2970442
Mobile 24 91249 98087 1075159
40 15129 16234 285866
8 156381 162952 1747539
TABLE V. AVERAGE EDR SCORES OF ENCRYPTED AND BasketballPass 24 25586 27982 525422
40 2694 3207 69143
ORIGINAL FRAMES
8 1335708 1369545 12605551
EDR BQMall 24 123706 133542 1887324
Video QP = 8 QP = 24 QP = 40 40 15913 17993 341932
Sequence Org. Enc. Org. Enc. Org. Enc. 8 1529922 1576207 16440069
Mobile 0.035 0.952 0.135 0.98 0.353 0.992 Johnny 24 67579 72493 1502727
40 9478 10669 188802
BasketballPass 0.108 0.91 0.255 0.928 0.433 0.968 8 1581943 1639071 15812535
BQMall 0.082 0.911 0.266 0.944 0.531 0.968 FourPeople 24 109661 115938 1947921
40 19176 20552 371282
Johnny 0.239 0.911 0.357 0.964 0.449 0.967
8 5747621 5842386 74431080
FourPeople 0.194 0.931 0.35 0.946 0.535 0.938 BasketballDrive 24 248473 268004 6289156
BasketballDrive 0.141 0.781 0.985 0.452 0.986 40 25455 30993 428516
0.296 8 1.05x107 1.08x107 111624820
PeopleOnStreet 0.192 0.946 0.316 0.98 0.557 0.991 PeopleOnStreet 24 1144257 1276675 19361562
Traffic 0.195 0.957 0.396 0.987 0.444 0.983 40 136658 169194 2799887
8 9217222 9523708 100424218
Traffic 24 782504 844394 14478533
40 103101 110918 1957091

TABLE VI. BIT RATE CHANGE OF OUR METHOD COMPARED


TO OTHER WORKS.
Bit Rate Increase V. CONCLUSION
Video Sequence Boyadjis (for State of the art (for Presented Considering the future video technologies in smart cities,
HEVC) [6] HEVC) [17] – Enc Scheme
(for VVC) employment of high-resolution videos for streaming, security
Mobile 0.0177 0.0244 0.0143 cameras, and industry applications are necessary. Hereby there
BasketballPass 0.0263 0.0430 0.0266 is a need for works to address the security of videos. In this
BQMall 0.0216 0.0320 0.0204 paper a selective encryption method for videos encoded with
Johnny 0.0200 0.0345 0.0204
H.266/VVC standard is presented. Key syntax elements of the
FourPeople 0.0246 0.0348 0.022
encoder such as MVD values and signs, IPM modes, and
BasketballDrive
residual signs that carry important information are selected for
0.0119 0.0290 0.0179
PeopleOnStreet
the encryption process. Then, the integrating of presented
0.0151 0.0292 0.0313
selective encryption algorithm to the encoder is discussed both
Traffic 0.0164 0.0249 0.0174
in terms of encoder performance (i.e. bitrate change), and visual
security of the output video (i.e. SSIM, PSNR, VMAF). The
results show that this method can be a suitable solution while
maintaining security of videos for most of the applications. The
C. Encryption Space importance of such selective encryption algorithms is even
The available encryption space is an important factor in more noticeable when dealing with videos that require large
selective encryption algorithms since it determines the number bandwidth like 4K/8K and 360 degree videos. Future works
of encrypted syntax elements. Encryption space should be large will be about the applicability of other syntax elements in the
enough to make the brute force process difficult for an H.266/VVC for the selective encryption and to search for more
adversary who wants to find the keys used for the encryption. efficient methods where encryption of syntax elements is only
Table VII shows the encryption space used in our method applied to a portion of CU units to reduce the computation cost
compared with two other methods proposed for HEVC. It can while yielding the highest visual security.
be seen that depending on the video sequence and assigned QP
9

REFERENCES [19] Fraunhofer Heinrich Hertz Institute. Fraunhofer Versatile Video


Encoder (VVenC). Accessed on: Jan. 27, 2021. [Online]. Available:
[1] C Chen, “Internet of Video Things: e t-Generation IoT with https://www.hhi.fraunhofer.de/en/departments/vca/technologies-
Vis al Sensors,” IEEE Internet Things J., vol. 7, no. 8, pp. 6676– and-solutions/h266-vvc/fraunhofer-versatile-video-encoder-
6685, Aug. 2020, doi: 10.1109/JIOT.2020.3005727. vvenc.html
[2] A. Mohan, K. Gauen, Y. Lu, W. W. Li and X. Chen, "Internet of video [20] T. Laude, Y. G. Adhisantoso, J. Voges, M. Munderloh, and J.
things in 2030: A world with many cameras," in Proc. 2017 IEEE Int. Ostermann, “A Comprehensive Video Codec Comparison,” APSIPA
Symp. Circuits and Systems (ISCAS), Baltimore, MD, 2017, doi: Trans. Signal Inf. Process., vol. 8, 2019, doi: 10.1017/atsip.2019.23.
10.1109/ISCAS.2017.8050296. [21] J. Chen, Y. Ye, and S. H. Kim. JVET-N1002 Algorithm Description
[3] Cisco. 2020 Global Networking Trends Report. 2020. Accessed: Nov. for Versatile Video Coding and Test Model 5 (VTM 5). Accessed on:
03, 2020. [Online]. Available: Jan. 21, 2021. [Online]. Available: http://phenix.it-
https://www.cisco.com/c/m/en_us/solutions/enterprise- sudparis.eu/jvet/doc_end_user/current_document.php?id=6641
networks/networking-report.html [22] Y.-K. W. Benjamin Bross, Jianle Chen, Shan Liu. JVET-
[4] Cisco. Cisco Annual Internet Report (2018–2023) White Paper. S2001Versatile Video Coding text specification Draft 10. Accessed
2020. Accessed: Nov. 03, 2020. [Online]. Available: on: Jan. 21, 2021. [Online]. Available: http://phenix.it-
https://www.cisco.com/c/en/us/solutions/collateral/executive- sudparis.eu/jvet/doc_end_user/current_document.php?id=10399
perspectives/annual-internet-report/white-paper-c11-741490.html [23] Z ang, A C ovi , Shei h, and E Simoncelli, “Image
[5] A. I. Sallam, O. S. Faragallah, and E. S. M. El- abaie, “ EVC q ality assessment: rom error visibility to str ct ral similarity,”
Selective Encryption sing C6 loc Cipher Techniq e,” IEEE IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004,
Trans. Multimed., vol. 20, no. 7, pp. 1636–1644, Jul. 2018, doi: doi: 10.1109/TIP.2003.819861.
10.1109/TMM.2017.2777470. [24] R. Rassool, "VMAF reproducibility: Validating a perceptual practical
[6] B. Boyadjis, C. Bergeron, B. Pesquet-Popescu, and F. Dufaux, video quality metric," in Proc. 2017 IEEE Int. Symp. Broadband
“E tended Selective Encryption of H.264/AVC (CABAC)-and Multimedia Systems and Broadcasting (BMSB), Cagliari, 2017, doi:
HEVC-Encoded Video Streams,” IEEE Trans. Circuits Syst. Video 10.1109/BMSB.2017.7986143.
Technol., vol. 27, no. 4, pp. 892–906, Apr. 2017, doi: [25] Netflix/vmaf: Perceptual video quality assessment based on multi-
10.1109/TCSVT.2015.2511879. method fusion. Accessed on: Jan. 21, 2021. [Online]. Available:
[7] Z Shahid and ech, “Vis al protection of EVC video by https://github.com/Netflix/vmaf
selective encryption of CA AC binstrings,” IEEE Trans. Multimed.,
vol. 16, no. 1, pp. 24–36, Jan. 2014, doi:
10.1109/TMM.2013.2281029.
[8] A. Wieckowski et al., "Towards A Live Software Decoder
Implementation For The Upcoming Versatile Video Coding (VVC)
Codec," in Proc. 2020 IEEE Int. Conf. Image Process. (ICIP), Abu
Dhabi, UAE, 2020, pp. 3124-3128, doi:
10.1109/ICIP40778.2020.9191199.
[9] J. Meyer and F. Gadegast. Security mechanisms for Mulimedia-Data
with the Example MPEG-I-Video. Accessed: Jan. 27, 2021. [Online].
Available: http://www.gadegast.de/frank/doc/secmeng.pdf
[10] K. Muhammad, R. Hamza, J. Ahmad, J. Lloret, H. Wang, and S. W.
ai , “Sec re s rveillance frame or for IoT systems sing
probabilistic image encryption,” IEEE Trans. Ind. Informatics, vol.
14, no. 8, pp. 3679–3689, Aug. 2018, doi:
10.1109/TII.2018.2791944.
[11] reish ber, T tter, S at enbeisser, and A hl, “ epreciating
motivation and empirical security analysis of chaos-based image and
video encryption,” IEEE Trans. Inf. Forensics Secur., vol. 13, no. 9,
pp. 2137–2150, Sep. 2018, doi: 10.1109/TIFS.2018.2812080.
[12] S A Ye ng, S Zh , and Zeng, “ esign of ne nitary
transforms for percept al video encryption,” IEEE Trans. Circuits
Syst. Video Technol., vol. 21, no. 9, pp. 1341–1345, Sep. 2011, doi:
10.1109/TCSVT.2011.2125630.
[13] S ian, Z i , Z en, and Z ang, “Selective video encryption
based on advanced video coding,” in Proc. 6th Pacific-Rim Conf.
Adv. Multimedia Inf. Process., Jeju Island, Korea, 2005, vol. 3768
LNCS, pp. 281–290, doi: 10.1007/11582267_25.
[14] G. Van Wallendael, A. Boho, J. De Cock, A. Munteanu, and R. Van
e alle, “Encryption for high efficiency video coding ith video
adaptation capabilities,” IEEE Trans. Consum. Electron., vol. 59, no.
3, pp. 634–642, 2013, doi: 10.1109/TCE.2013.6626250.
[15] K. Thiyagarajan, R. Lu, K. El-San ary, and Zh , “Energy-Aware
Encryption for Securing Video Transmission in Internet of
ltimedia Things,” IEEE Trans. Circuits Syst. Video Technol., vol.
29, no. 3, pp. 610–624, Mar. 2019, doi:
10.1109/TCSVT.2018.2808174.
[16] X , “Comm tative Encryption and ata iding in EVC Video
Compression,” IEEE Access, vol. 7, pp. 66028–66041, 2019, doi:
10.1109/ACCESS.2019.2916484.
[17] eng, X Zhang, Z X in, and ong, “A T nable Selective
Encryption Scheme for H.265/HEVC Based on Chroma IPM and
Coefficient Scrambling,” IEEE Trans. Circuits Syst. Video Technol.,
vol. 30, no. 8, pp. 2765–2780, Aug. 2020, doi:
10.1109/TCSVT.2019.2924910.
[18] jvet / VVCSoftware_VTM GitLab. Accessed on: Jan. 27, 2021.
[Online]. Available:
https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM

You might also like