How To Encode Video For The Future: Dror Gill, CTO, Beamr

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

How to Encode

Video for the Future

Dror Gill, CTO, Beamr

Abstract
The quality expectations of viewers paired with the ever-increasing shift to over-the-top (OTT)
and mobile video consumption, are driving today’s networks to be more congested with video
than ever before. To counter this congestion, this paper will cover advanced techniques for
applying content-adaptive encoding and optimization methods to video workflows while lowering
the bitrate of encoded video without compromising quality.

Intended Audience:
Business decision makers
Video encoding engineers

To learn more about optimized content-adaptive encoding, email info@beamr.com

©Beamr Imaging Ltd. 2017 | beamr.com


Table of contents
3 Encoding for the future.

3 The trade off between


bitrate and quality.
3 Legacy approaches to encoding
your video content.
3 What is constant bitrate (CBR) encoding?
4 What about variable bitrate encoding?
4 Encoding content with constant
rate factor encoding.
4 Capped content rate factor encoding
for high complexity scenes.
4 Encoding content for the future.
5 Manually encoding content by title.
5 Manually encoding content
by the category.
6 Content-adaptive encoding
by the title and chunk.
6 Content-adaptive encoding
using neural networks.
7 Closed loop content-adaptive
encoding by the frame.
9 How should you be encoding
your content?
10 References

©Beamr Imaging Ltd. 2017 | beamr.com


Encoding for the future. how it will impact the file size and perceived visual
quality of the video.
The standard method of encoding video for delivery
over the Internet utilizes a pre-set group of resolutions The rate control algorithm adjusts encoder parameters
and bitrates known as adaptive bitrate (ABR) sets. in order to achieve a targeted bitrate. This algorithm
Industry guidelines, such as Apple’s Tech Note TN2224 allocates a budget of bits to each group of pictures,
[1], define a typical ABR set as a fixed set of bitrates and individual frames, and in some cases sub-frames in
resolutions for all video files regardless of the content a video sequence. The quantization parameter (QP)
type. In traditional terrestrial, cable, and satellite TV, regulates how much spatial detail is retained:
bandwidth is pre-allocated and fixed. In the world of
IP streaming, there is greater flexibility that allows for When using higher QP values, bitrate is reduced
content-adaptive bitrate schemes to achieve reduced and perceived quality lowered.
bitrate while maintaining high video quality. While using lower QP values, the bitrate will be
increased along with the subjective quality of the
As content owners and video streaming providers look video.
to the future, they are assessing:
How to balance bitrate and quality
Different video coding approaches Legacy approaches to encoding
Techniques for content-adaptive encoding your video content.

Below, we cover approaches that operate with existing There are several approaches to encoding video
video coding standards such as H.264/AVC and H.265/ content at scale.
HEVC. There are other approaches that use proprietary
video coding schemes and add-ons, but they tend to Constant bitrate encoding
overcomplicate deployments. The methods explained Variable bitrate encoding
below range from simple techniques meant for small Constant rate factor encoding
scale implementations, to more complex methods that Capped constant rate factor encoding
need access to a large database of content and/or
significant compute power. Lastly, we’ll cover a closed- Let’s discuss the advantages and disadvantages to
loop process that leverages a perceptually-aligned each approach.
quality measure.

What is constant bitrate (CBR) encoding?


The trade off between
bitrate and quality. CBR (constant bitrate) rate control produces an output
stream with a relatively constant data rate. A highly
When streaming video over the Internet, video encoding regulated stream was required in the early days
is typically done using an ABR set. Each ABR set of of video delivery by devices which were based on
encoding configurations includes resolutions, bitrates, hardware architecture, and did not have the flexibility to
and encoding parameters, which are used to encode support varying data rates.
a single video title. ABR sets are designed to deliver
an optimal range of viewing experiences to connected The disadvantage of CBR is that applying a constant
users over varying network bandwidths and devices. data rate to varying content complexity inherently
causes unstable output quality:
Block-based video encoding schemes are inherently
lossy processes that achieve compression by removing Complex scenes requiring more bits than the
information from the bitstream while taking into account target data rate allows will suffer from low quality

3 ©Beamr Imaging Ltd. 2017 | beamr.com


Simpler scenes will be encoded with an that might require a bitrate higher than the maximum to
unnecessarily high bit budget obtain the target quality. With lower complexity content,
CRF encoding results in a data rate that is lower than
the maximum.
What about variable bitrate encoding?
Figure 1 illustrates the drawback of the capped CRF
Variable bitrate (VBR) encoding is more efficient than approach. In scenes where the bitrate cap is applied,
CBR, since it allows the bitrate to vary dynamically which are typically scenes that require a higher bitrate
around a specified average data rate. The dynamic in order to maintain the subjective visual quality, the
variations of bitrate in VBR are typically tied to scene bitrate is constrained. This causes a compromise in
complexity, so the available bits are allocated in a more video quality in sections where the degradation will be
optimal way, resulting in increased quality for complex the most visible.
scenes, and lower bitrate for simpler scenes. However,
since the VBR algorithms determine the amount of bits FIGURE 1 - CAPPED CRF QUALITY DEGRADATION
based on the complexity of scenes and not according
to a true perceptual quality measure, it is less efficient Actual bitrate needed for crf=21

Cap applied causing effective crf value to drop and quality will drop as well
at adapting to the content than other methods that will
be presented below. BITRATE

Encoding content with constant maxrate = 4000k

rate factor encoding.

Constant rate factor (CRF) encoding requires


workflows to set a desired quality level for encoding
a video stream, as opposed to a fixed bitrate. While
encoding with CRF, the encoder tries to maintain TIME
BUFFER
the desired quality level and reduces bitrate during WINDOW

high motion scenes, taking advantage of the fact


that the human eye perceives less loss of detail in a LOW COMPLEXITY HIGH COMPLEXITY LOW COMPLEXITY

moving scene than in a static scene. Using CRF, the


bitrate of the encoded file is not set in advance, and
can reach high levels for complex scenes. Another
disadvantage of the CRF method is that the same The different encoding approaches described above
CRF value can yield different levels of perceptual offer tradeoffs between bitrate restriction and quality.
quality across a video library, and even between Still, the question remains how can we obtain the best
scenes in the same video. or optimal bitrate tradeoff decisions?

Capped constant rate factor encoding Encoding content for the future.
for high complexity scenes.
Content-adaptive encoding was engineered to
Capped CRF encoding is a bitrate control technique configure video encoders according to the content
which combines CRF encoding with a maximum bitrate in video streams, instead of applying predetermined
“cap.” The encoder adjusts the data rate to deliver the parameters. Using content-adaptive encoding enables
specified quality level, but never exceeds the specified us to reach optimal bitrate vs quality trade-offs, allowing
maximum bitrate, even in relatively complex streams for improved quality at similar bitrates, or equivalent

4 ©Beamr Imaging Ltd. 2017 | beamr.com


quality at a lower bitrate. These solutions help provide  eparate a video repository into distinct categories,
S
better user experience while reducing infrastructure and identify 3-5 videos in each category that
costs. Content-adaptive encoding approaches include: represents a cross section of the content within that
category. These videos will serve as the test set for
Manual content-adaptive encoding per title encoding.
Manual content-adaptive encoding per category
Manual content-adaptive encoding by the title and  ncode the videos in the test set using several CRF
E
chunk values. Examine the quality of the resulting videos,
Content-adaptive encoding using neural networks and for each video in the set, find the CRF value
Closed loop content-adaptive encoding by the that delivers acceptable quality at the lowest bitrate,
frame and average the bitrates across the videos in each
category. This average bitrate will be the target bitrate
for that category.
Manually encoding content by title.
 ncode the test files at this target rate using either
E
One relatively straightforward method to reduce bitrates CBR or VBR encoding. Play the files and verify that
according to the needs of the content is to manually encoding quality is still acceptable. Test the file with
customize encoding parameters for each file. This an objective quality metric (such as PSNR), and focus
approach encompasses manually experimenting with on both the average quality and the lowest quality
each video file, seeking parameters which provide regions or frames.
the best quality at a given bitrate, or the lowest bitrate
possible for a predetermined quality. The process is If the quality is acceptable at the rate shown in step 2,
incredibly time consuming, not scalable, and not very build an ABR set for each class of content based on
effective because it would be impossible to manually the selected resolution and bitrate.
tune the parameters for each section in the video
stream, as made possible by some of the methods This per-category encoding approach is generally less
described below. The manual per-title encoding effective for premium movie and TV shows, since the
approach is typically used where volumes are extremely range of encoding complexity within a class can vary
low, but the value of each video title, high. Blu-ray is one significantly. As an example, though animated movies
example of where this approach is workable. are typically easier to compress than real world videos,
the range of animated techniques is highly diverse,
ranging from 2D cell animation and 3D animation
Manually encoding content to human-like computer graphics. This makes it
by the category. challenging to create an ABR set that works across
all animated content equally. Thus a weakness of the
Category specific content-adaptive encoding works categorized content encoding approach can be seen
by manually classifying a content library into different when there is wide variety inside at least one category.
content categories, such as movies, talk shows,
PowerPoint presentations, etc. Then the engineer finds Another challenge of per-category encoding is the
the optimal encoding parameters and bitrates for each variability between scenes in a single title. For example
category and encodes each using category-specific a sports show may include talking head in-studio
encoding parameters. shots along with fast action game shots and slow-
motion recaps, each requiring different bitrate values
Jan Ozer in Video Encoding by the Numbers [2] to maximize quality while ensuring the minimal bitrate
outlined the following steps for implementing category- possible.
specific encoding:

5 ©Beamr Imaging Ltd. 2017 | beamr.com


Content-adaptive encoding parameters based on features extracted from a target
by the title and chunk. encode of the video.

Netflix [3] proposed a fully automated approach for In the proposed technique, a sample of 10,000 videos
title specific encoding optimization, which creates an are encoded using all possible CRF values, and the
optimal ABR set for each video title. This technique is objective quality of each encode is measured using
based on encoding the content at different CRF levels the SSIM quality measure [6]. A machine-learning
and resolutions, and then evaluating the quality of the algorithm correlates between the optimal CRF value for
resulting videos using an objective quality measure. each clip, and the measurements of each clip including
Netflix is using the Video Multimethod Assessment resolution, frame rate, bitrate and motion vectors. By
Fusion (VMAF) quality metric [5] to evaluate video applying this machine- learning algorithm to all the
quality during the analysis stage. The quality scores videos in the database, the optimal CRF value can be
generate a Convex Hull which identifies the resolution determined based on the features of the encoded
which produces the highest quality encode at each video clip. Selection of the CRF value is improved by
bitrate. An illustration of results is in Figure 2. adding data from a low-resolution CRF encode of the
actual clip to the machine learning algorithms.
FIGURE 2 - CONVEX HULL GENERATED BY
NETFLIX’S PER-TITLE ENCODING FIGURE 3 - YOUTUBE RESULTS - CUMULATIVE
DISTRIBUTION OF BITRATE ERRORS (%) ACROSS
THE TEST SET
Using best NNLS fit parameters
Max. smeared crf from NN with prev. transcode
HIGH RESOLUTION Max. smeared crf from neural net classification
QUALITY CONVEX HULL Using pre-set CRF

100
MID RESOLUTION
90
CUMULATIVE PERCENT OF VALIDATION BLOCKS

80
LOW RESOLUTION

70
BITRATE
60

50
While the scheme in Netflix proposal [3] uses CRF for
40
the trial encodes which determine quality, the final
encode is done using 2-pass VBR. This approach is very 30

computationally intensive and requires multiple encodes


20
at different resolutions and bitrates. It is suitable for
content libraries that are diverse, but limited in size, such 10

as premium content including TV series and movies. 0


0 10 20 30 40 50 60 70 80 90 100

BITRATE ERROR (PERCENT)

Content-adaptive encoding
using neural networks. Results of this technique are shown in Figure 3,
where the blue line illustrates best fit, the red line is
YouTube [4] presented an approach for customizing the original result from the neural network, while the
encoding parameters for user-generated videos orange line illustrates the final result by combining the
uploaded to their platform, by building a machine- results of the low resolution CRF encode. The green
learning algorithm that estimates the optimal encoding line represents the baseline using a preset CRF value.

6 ©Beamr Imaging Ltd. 2017 | beamr.com


FIGURE 4 - CLOSED LOOP FRAME-LEVEL OPTIMIZATION PROCESS

Closed loop content-adaptive encoding


per frame. 3. Obtain the decoded candidate output frame.

Beamr has developed a process for content-adaptive 4. Compare the decoded candidate output frame to the
encoding based on closed loop re-encoding of input decoded source frame using an objective perceptual
frames at various compression levels, while checking quality measure.
the value of a proprietary quality measure [8] that has
high correlation with human subjective results. The 5. If the quality is above a high threshold, indicating that
input to the Beamr process is a video file that has been the source frame can be compressed to a lower size
compressed at the quality level desired which serves without compromising perceptual quality, increase the
as the quality reference for the re-encoding process. compression level and return to step 2.
This process is applied after the initial encode of the
video stream, but before packaging for streaming. 6. If the quality is below a low threshold, indicating that
the candidate output frame is not perceptually identical
As described in Figure 4, the closed loop content- to the input frame, decrease the compression level and
adaptive encoding process consists of the following return to step 2.
steps:
7. If the quality is between the low threshold and the
1. Decode the source video frame. high threshold, output the candidate output frame to
the output stream, move to the next source frame, and
2. Re-encode the source video frame using a return to step 1.
compression level determined by the system controller,
creating a candidate output frame. By evaluating the quality of the video on a frame-by-

7 ©Beamr Imaging Ltd. 2017 | beamr.com


frame basis, this method of content adaptive encoding trailers, comprising various content types including
ensures that the results of encoding are optimal - the animation, action and drama. The results clearly show
overall video file is encoded to the lowest possible size, the different bitrate saving that can be achieved on
while fully retaining the source quality of each frame. titles that have different resolutions, bitrates and
content. This means where bits per pixel are high,
savings will be greater than where the ratio of bits per
Results pixel are low.

Figure 5 shows a graph of bitrate as a function of time


for a 1080p source video before optimization, and a
corresponding output video after optimization. The
average bitrate in this case was reduced from 3510
Kbps to 2584 Kbps, while the peak bitrate (marked in
green) was reduced from 7593 Kbps to 5598 Kbps. It
is also important to note that in sections of the video
where the bitrate cannot be further reduced without
compromising quality, the source bitrate is retained
(marked in red). The result is a video that is visually
identical to the source, only smaller.

Table 1 shows the results of Beamr’s closed loop frame-


level content-adaptive optimization on a set of movie

FIGURE 5 - BITRATE GRAPH COMPARING ORIGINAL TO OPTIMIZED STREAM

8 ©Beamr Imaging Ltd. 2017 | beamr.com


TABLE 1 - RESULTS OF BEAMR’S OPTIMIZATION PROCESS ON A COLLECTION OF MOVIE TRAILERS

How should you be encoding


your content?

In this paper, we detailed why the industry needs to


address the quality-bitrate challenge that is caused by
the massive consumer adoption of streaming video, in
order to meet user expectations and network/
infrastructure constraints. In this article, we presented
the case for content adaptive encoding and the
various approaches possible including their
advantages and disadvantages. All methods presented
are a step-up from the existing preset recipes such as
are outlined in [1] with a one size fits all approach.

Of the proposed solutions, the closed loop per-frame


content adaptation, when combined with a reliable
perceptual quality metric, provides the best solution in
terms of scalability and adaptivity to different content
types, as well as being the only method that can
guarantee no loss of quality compared to the input
clip. We believe that this approach will be a driving
force in the industry, and will enable reaching optimal
quality-bitrate trade-off decisions.

9 ©Beamr Imaging Ltd. 2017 | beamr.com


REFERENCES

[1] Technical Note TN2224, “Best Practices for Creating and Deploying HTTP Live Streaming Media for Apple Devices”, Apple Inc., August 2016
http://snip.ly/ev144

[2] Jan Ozer, “Video Encoding by the Numbers: Eliminate the Guesswork from Your Streaming Video”, Doceo Publishing, January 2017 http://snip.
ly/gy9vn

[3] Anne Aaron, Zhi Li, Megha Manohara, Jan De Cock and David Ronca, Netflix, “Per-Title Encode Optimization”, The Netflix Tech Blog, December
14th, 2015 http://snip.ly/hh2lq
[4] Michele Covell, Martin Arjovsky, Yao-Chung Lin and Anil Kokaram, “Optimizing transcoder quality targets using a neural network with an
embedded bitrate model”, Proceedings of the Conference on Visual Information Processing and Communications 2016, San Francisco http://snip.
ly/ys7id

[5] Zhi Li, Anne Aaron, Ioannis Katsavounidis, Anush Moorthy and Megha Manohara, “Toward A Practical Perceptual Video Quality Metric”, The
Netflix Tech Blog, June 6th, 2016. http://snip.ly/fl1pp

[6] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE
Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.

[7] The Case for Content-Adaptive Optimization, Beamr, March 2016 http://snip.ly/j08l4

[8] US Patent 9,491,464 “Controlling a video content system by computing a frame quality score” http://snip.ly/laz7t

10 ©Beamr Imaging Ltd. 2017 | beamr.com

You might also like