How To Encode Video For The Future: Dror Gill, CTO, Beamr
How To Encode Video For The Future: Dror Gill, CTO, Beamr
How To Encode Video For The Future: Dror Gill, CTO, Beamr
Abstract
The quality expectations of viewers paired with the ever-increasing shift to over-the-top (OTT)
and mobile video consumption, are driving today’s networks to be more congested with video
than ever before. To counter this congestion, this paper will cover advanced techniques for
applying content-adaptive encoding and optimization methods to video workflows while lowering
the bitrate of encoded video without compromising quality.
Intended Audience:
Business decision makers
Video encoding engineers
Below, we cover approaches that operate with existing There are several approaches to encoding video
video coding standards such as H.264/AVC and H.265/ content at scale.
HEVC. There are other approaches that use proprietary
video coding schemes and add-ons, but they tend to Constant bitrate encoding
overcomplicate deployments. The methods explained Variable bitrate encoding
below range from simple techniques meant for small Constant rate factor encoding
scale implementations, to more complex methods that Capped constant rate factor encoding
need access to a large database of content and/or
significant compute power. Lastly, we’ll cover a closed- Let’s discuss the advantages and disadvantages to
loop process that leverages a perceptually-aligned each approach.
quality measure.
Cap applied causing effective crf value to drop and quality will drop as well
at adapting to the content than other methods that will
be presented below. BITRATE
Capped constant rate factor encoding Encoding content for the future.
for high complexity scenes.
Content-adaptive encoding was engineered to
Capped CRF encoding is a bitrate control technique configure video encoders according to the content
which combines CRF encoding with a maximum bitrate in video streams, instead of applying predetermined
“cap.” The encoder adjusts the data rate to deliver the parameters. Using content-adaptive encoding enables
specified quality level, but never exceeds the specified us to reach optimal bitrate vs quality trade-offs, allowing
maximum bitrate, even in relatively complex streams for improved quality at similar bitrates, or equivalent
Netflix [3] proposed a fully automated approach for In the proposed technique, a sample of 10,000 videos
title specific encoding optimization, which creates an are encoded using all possible CRF values, and the
optimal ABR set for each video title. This technique is objective quality of each encode is measured using
based on encoding the content at different CRF levels the SSIM quality measure [6]. A machine-learning
and resolutions, and then evaluating the quality of the algorithm correlates between the optimal CRF value for
resulting videos using an objective quality measure. each clip, and the measurements of each clip including
Netflix is using the Video Multimethod Assessment resolution, frame rate, bitrate and motion vectors. By
Fusion (VMAF) quality metric [5] to evaluate video applying this machine- learning algorithm to all the
quality during the analysis stage. The quality scores videos in the database, the optimal CRF value can be
generate a Convex Hull which identifies the resolution determined based on the features of the encoded
which produces the highest quality encode at each video clip. Selection of the CRF value is improved by
bitrate. An illustration of results is in Figure 2. adding data from a low-resolution CRF encode of the
actual clip to the machine learning algorithms.
FIGURE 2 - CONVEX HULL GENERATED BY
NETFLIX’S PER-TITLE ENCODING FIGURE 3 - YOUTUBE RESULTS - CUMULATIVE
DISTRIBUTION OF BITRATE ERRORS (%) ACROSS
THE TEST SET
Using best NNLS fit parameters
Max. smeared crf from NN with prev. transcode
HIGH RESOLUTION Max. smeared crf from neural net classification
QUALITY CONVEX HULL Using pre-set CRF
100
MID RESOLUTION
90
CUMULATIVE PERCENT OF VALIDATION BLOCKS
80
LOW RESOLUTION
70
BITRATE
60
50
While the scheme in Netflix proposal [3] uses CRF for
40
the trial encodes which determine quality, the final
encode is done using 2-pass VBR. This approach is very 30
Content-adaptive encoding
using neural networks. Results of this technique are shown in Figure 3,
where the blue line illustrates best fit, the red line is
YouTube [4] presented an approach for customizing the original result from the neural network, while the
encoding parameters for user-generated videos orange line illustrates the final result by combining the
uploaded to their platform, by building a machine- results of the low resolution CRF encode. The green
learning algorithm that estimates the optimal encoding line represents the baseline using a preset CRF value.
Beamr has developed a process for content-adaptive 4. Compare the decoded candidate output frame to the
encoding based on closed loop re-encoding of input decoded source frame using an objective perceptual
frames at various compression levels, while checking quality measure.
the value of a proprietary quality measure [8] that has
high correlation with human subjective results. The 5. If the quality is above a high threshold, indicating that
input to the Beamr process is a video file that has been the source frame can be compressed to a lower size
compressed at the quality level desired which serves without compromising perceptual quality, increase the
as the quality reference for the re-encoding process. compression level and return to step 2.
This process is applied after the initial encode of the
video stream, but before packaging for streaming. 6. If the quality is below a low threshold, indicating that
the candidate output frame is not perceptually identical
As described in Figure 4, the closed loop content- to the input frame, decrease the compression level and
adaptive encoding process consists of the following return to step 2.
steps:
7. If the quality is between the low threshold and the
1. Decode the source video frame. high threshold, output the candidate output frame to
the output stream, move to the next source frame, and
2. Re-encode the source video frame using a return to step 1.
compression level determined by the system controller,
creating a candidate output frame. By evaluating the quality of the video on a frame-by-
[1] Technical Note TN2224, “Best Practices for Creating and Deploying HTTP Live Streaming Media for Apple Devices”, Apple Inc., August 2016
http://snip.ly/ev144
[2] Jan Ozer, “Video Encoding by the Numbers: Eliminate the Guesswork from Your Streaming Video”, Doceo Publishing, January 2017 http://snip.
ly/gy9vn
[3] Anne Aaron, Zhi Li, Megha Manohara, Jan De Cock and David Ronca, Netflix, “Per-Title Encode Optimization”, The Netflix Tech Blog, December
14th, 2015 http://snip.ly/hh2lq
[4] Michele Covell, Martin Arjovsky, Yao-Chung Lin and Anil Kokaram, “Optimizing transcoder quality targets using a neural network with an
embedded bitrate model”, Proceedings of the Conference on Visual Information Processing and Communications 2016, San Francisco http://snip.
ly/ys7id
[5] Zhi Li, Anne Aaron, Ioannis Katsavounidis, Anush Moorthy and Megha Manohara, “Toward A Practical Perceptual Video Quality Metric”, The
Netflix Tech Blog, June 6th, 2016. http://snip.ly/fl1pp
[6] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE
Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.
[7] The Case for Content-Adaptive Optimization, Beamr, March 2016 http://snip.ly/j08l4
[8] US Patent 9,491,464 “Controlling a video content system by computing a frame quality score” http://snip.ly/laz7t