Rpribas, 327-QuaseFinal
Rpribas, 327-QuaseFinal
Rpribas, 327-QuaseFinal
2, 2021 1
Abstract— The new Versatile Video Coding (VVC) standard higher frame sampling rate, increasing storage/transmission
was recently developed to improve compression efficiency of requirements. A video with a resolution of 1920 × 1080
previous video coding standards and to support new applica- pixels with 30 frames per second (fps), when pixels are rep-
tions. The compression efficiency gain was achieved in the resented with 24 bits, produces a bit rate of 186.6 MB per
standardization process at the cost of an increase in the com- second. To store 1 hour of this video would require 671.8
putational complexity of the encoder algorithms, which leads GB of storage space. UHD 4K videos increase bit rate and
to the need to develop hardware accelerators and to apply ap- storage space requirements by 4× compared to HD video.
proximate computing techniques to reach the performance and Thus, it becomes unfeasible to use such raw video represen-
power dissipation required for systems that encode video. This tation, with motivates the need for video compression.
work proposes the implementation of an approximate hard- The Versatile Video Coding (VVC) [3, 4] standard was
ware architecture for interpolation filters defined in the VVC recently developed by the International Telecommunication
standard targeting fractional motion estimation requirements Union (ITU) Video Coding Experts Group (VCEG) and In-
of real-time processing of high resolution videos scenario. The ternational Organization for Standardization (ISO) Motion
architecture includes four filter cores in parallel, each one gen- Picture Experts Group (MPEG) to increase compression ef-
erating 15 fractional per clock cycle, so it calculates 60 frac- ficiency compared to previous VCEG/MPEG standard High
tional pixels in parallel. Each filter core is based on approxi- Efficiency Video Coding (HEVC) [5], and to be versatile
mating the original 8-tap and 7-tap interpolation filters defined to support different video applications, e.g. high dynamic
in the VVC standard to 6-tap interpolation filters, and by ap- range, screen content, multiview, and 360-degree videos. As
plying Multiple Constant Multiplication (MCM) algorithm to reported by [4], VVC offers bit rate savings of about 50%
optimize filter datapaths. The architecture is able to process up compared to HEVC for equal subjective quality. However,
to 2560 × 1600 pixels videos at 30 fps with power dissipation this comes with an impact on the computational complex-
of 23.9 mW when operating at a frequency of 522 MHz, with ity required to encode videos. The processing time of the
an average compression efficiency degradation of only 0.41% VVC encoder software is 10.2 times higher than HEVC en-
compared to default VVC video encoder software configura- coder (on average for different videos) when Single Instruc-
tion. tion Multiple Data (SIMD) instructions are enabled, and this
Index Terms— Video coding; Versatile Video Coding; Inter- cost increases by 15.9 times when SIMD instructions are dis-
polation Filter; Hardware; Architecture. abled [6].
Motion Estimation (ME) stands out as one of the most
I. I NTRODUCTION computing-intensive parts in modern encoders. This step is
Digital video is widespread into many electronic devices, commonly composed of an integer motion estimation (IME)
enabling a diversity of applications such as video on demand, and fractional motion estimation (FME), each requiring sev-
digital television, video surveillance, etc. There is a growing eral block-matching operations to be performed. Particularly
demand for digital video, which is explained by the increased FME is even more concerning, as it requires an interpolation
number of devices: a forecast by Cisco points out that by of the fractional pixels prior to its block-matching. To in-
2023 the number of devices connected to Internet Protocol terpolate these samples, the HEVC standard uses 3 different
(IP) networks will be more than three times the global pop- FIR filters with 8-taps to generate 1/2 and 1/4 pixels. VVC
ulation [1]. The huge demand for digital video and the raise increases this complexity, as it introduces a precision of 1/16
of video resolutions and frame rates pushes the internet data pixels for the motion vectors in Affine mode [7]. Therefore,
traffic related to video transmission. By 2023, 66% of flat- VVC fractional interpolation filter is at least 17× more com-
panel TVs will support Ultra-High-Definition (UHD) or 4K plex than HEVC fractional interpolation filter.
resolution (3840 × 2160 pixels). It results in an increase of The high computational complexity of the VVC standard
video traffic over the Internet. Today video traffic share is also brings restrictions regarding power consumption on mo-
about 80% of total Internet traffic and it continues to grow bile devices. In order to deal with these restrictions, a com-
for the next years [2]. mon and efficient solution is to implement hardware acceler-
Given these demands, and the market need for applica- ators, since these dedicated hardware architectures are more
tions with even higher visual quality, videos are constantly efficient in terms of power/energy. Recent solutions also rely
produced with higher spatial resolution, higher bit depth and on approximate computing to further reduce power of in-
tion are based on a few videos, which do not represent a real Intra-frame
prediction
inter/intra decision
[8] C. M. Diniz, B. Abreu, M. Grellert, F. M. Sampaio, D. Palomino, [21] H. Azgin, E. Kalali, and I. Hamzaoglu, “An approximate versatile
F. L. L. Ramos, B. Zatt, and S. Bampi, “Joint algorithm-architecture video coding fractional interpolation hardware,” in 2020 IEEE Inter-
design of video coding modules,” VLSI Architectures for Future Video national Conference on Consumer Electronics (ICCE). IEEE, 2020,
Coding, p. 41, 2019. pp. 1–4.
[9] B. Bing, Next-generation video coding and streaming. John Wiley [22] F. Bossen, “JVET common test conditions and software reference con-
& Sons, 2015. figurations for sdr video,” in Document JVET-N1010 14h JVET Meet-
ing, Geneva, CH, 2019.
[10] A. CanMert, E. Kalali, and I. Hamzaoglu, “A low power versatile
video coding (VVC) fractional interpolation hardware,” in 2018 Con- [23] J. M. Moura, J. Johnson, R. Johnson, D. Padua, V. Prasanna,
ference on Design and Architectures for Signal and Image Processing M. Püschel, and M. Veloso. (2020) Spiral multiplier block generator.
(DASIP). IEEE, 2018, pp. 43–47. http://spiral.ece.cmu.edu/mcm/gen.html.
[11] C. M. Diniz, M. Shafique, S. Bampi, and J. Henkel, “High-throughput [24] G. Bjontegaard, “Calculation of average PSNR differences between
interpolation hardware architecture with coarse-grained reconfig- RD-curves,” VCEG-M33, 2001.
urable datapaths for hevc,” in 2013 IEEE International Conference
[25] VTM. (2020) VVC test model (VTM) v. 10.1rc1.
on Image Processing, 2013, pp. 2091–2095.
https://jvet.hhi.fraunhofer.de/.
[12] ——, “A reconfigurable hardware architecture for fractional pixel in-
terpolation in high efficiency video coding,” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol. 34,
no. 2, pp. 238–251, 2015.