NVENC VideoEncoder API ProgGuide
NVENC VideoEncoder API ProgGuide
(NVENC) INTERFACE
Programming Guide
DOCUMENT CHANGE HISTORY
NVENC_VideoEncoder_API_PG-06155-001_v07
Chapter 1. Introduction.................................................................................. 1
Chapter 2. Basic Encoding Flow ....................................................................... 2
Chapter 3. Setting Up Hardware for Encoding .................................................... 3
3.1 Opening an Encode Session .......................................................................... 3
3.1.1 Initializing encode device .................................................................... 3
3.2 Selecting Encoder Codec GUID ...................................................................... 4
3.3 Encoder Preset Configurations ....................................................................... 4
3.3.1 Enumerating preset GUIDs .................................................................. 5
3.3.2 Selecting encoder preset configuration .................................................... 5
3.4 Selecting an Encoder Profile ......................................................................... 6
3.5 Getting Supported List of Input Formats ........................................................... 6
3.6 Querying encoder Capabilities ....................................................................... 6
3.7 Initializing the Hardware Encoder Session ......................................................... 7
3.8 Encode Session Attributes ............................................................................ 7
3.8.1 Configuring encode session attributes ..................................................... 7
3.8.2 Finalizing codec configuration for encoding ............................................... 8
3.8.3 Rate control .................................................................................... 9
3.8.4 Setting encode session attributes .......................................................... 10
3.9 Creating Resources Required to Hold Input/output Data ...................................... 11
3.10 Retrieving Sequence Parameters ................................................................ 12
Chapter 4. Encoding the Video Stream ........................................................... 13
4.1 Preparing Input Buffers for Encoding.............................................................. 13
4.1.1 Input buffers allocated through NVIDIA Video Encoder Interface.................... 13
4.1.2 Input buffers allocated externally .......................................................... 14
4.2 Configuring Per-Frame Encode Parameters ...................................................... 15
4.2.1 Forcing current frame to be encoded as intra frame ................................... 15
4.2.2 Forcing current frame to be used as a reference frame ............................... 15
4.2.3 Forcing current frame to be used as an IDR frame ..................................... 15
4.2.4 Requesting generation of sequence parameters ........................................ 15
4.3 Submitting Input Frame for Encoding ............................................................. 16
4.4 Retrieving Encoded Output .......................................................................... 16
Chapter 5. End of Encoding ........................................................................... 17
5.1 Notifying the End of Input Stream ................................................................. 17
5.2 Releasing Resources ................................................................................. 17
5.3 Closing Encode Session .............................................................................. 18
Chapter 6. Modes of Operation ...................................................................... 19
6.1 Asynchronous Mode (Windows 7 and above) .................................................... 19
6.2 Synchronous Mode ................................................................................... 21
6.3 Threading Model ...................................................................................... 21
Chapter 7. Motion-Estimation-Only Mode ....................................................... 23
7.1 Query Motion-Estimation Only Mode Capability .................................................. 23
NVIDIA GPUs based on Kepler, Maxwell and the latest Pascal architectures contain a
hardware-based H.264/HEVC video encoder (hereafter referred to as NVENC). The
NVENC hardware takes YUV/RGB as input, and generates an H.264/HEVC compliant
video bit stream. NVENC hardware’s encoding capabilities can be accessed using the
NVENCODE APIs, available in the NVIDIA Video Codec SDK.
This document provides information on how to program the NVENC using the
NVENCODE APIs exposed in the SDK. The NVENCODE APIs expose encoding
capabilities on Windows (Windows 7 and above) and Linux.
Developers can create a client application that calls NVENCODE API functions exposed
by nvEncodeAPI.dll for Windows or libnvidia-encode.so for Linux. These
libraries are installed as part of the NVIDIA display driver. The client application can
either link to these libraries at run-time using LoadLibrary() on Windows or
dlopen() on Linux.
The NVIDIA video encoder API is designed to accept raw video frames (in YUV or RGB
format) and output the H.264 or HEVC bitstream. Broadly, the encoding flow consists of
the following steps:
After loading the NVENC Interface, the client should first call
NvEncOpenEncodeSessionEx API to open an encoding session. This function
returns an encode session handle which must be used for all subsequent calls to the API
functions in the current session.
3.1.1.1 DirectX 9
The client should create a DirectX 9 device with behavior flags including:
D3DCREATE_FPU_PRESERVE
D3DCREATE_MULTITHREADED
D3DCREATE_HARDWARE_VERTEXPROCESSING
The client should pass a pointer to IUnknown interface of the created device (typecast to
void *) as NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::device, and set
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::deviceType to
NV_ENC_DEVICE_TYPE_DIRECTX. Use of DirectX devices is supported only on Windows
7 and later versions of the Windows OS.
3.1.1.3 DirectX 11
The client should pass a pointer to IUnknown interface of the created device (typecast to
void *) as NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::device, and set
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::deviceType to
NV_ENC_DEVICE_TYPE_DIRECTX. Use of DirectX devices is supported only on Windows
7 and later versions of Windows OS.
3.1.1.4 CUDA
The client should create a floating CUDA context, and pass the CUDA context handle as
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::device, and set
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::deviceType to
NV_ENC_DEVICE_TYPE_CUDA. Use of CUDA device for Encoding is supported on Linux
and Windows 7 and later OS’s.
Here are the steps to fetch a preset encode configuration and optionally change select
configuration parameters:
The client should do the following to retrieve a list of supported encoder profiles:
1. The client should specify the session parameters as described in Section 3.8.1.1.
2. Optionally, the client can enumerate and select preset GUID that best suits the
current use case, as described in Section 3.3.1. The client should then pass the
selected preset GUID using NV_ENC_INITIALIZE_PARAMS::presetGUID. This helps
the NVIDIA Video Encoder interface to correctly configure the encoder session
based on the encodeGUID and presetGUID provided.
3. The client should set the advanced codec-level
parameter pointer
NV_ENC_INITIALIZE_PARAMS::encodeConfig::encodeCodecConfig to NULL.
1. The client should specify the session parameters as described in Section 3.8.1.1.
2. The client should enumerate and select a preset GUID that best suites the current use
case, as described in Section 3.3.1. The client should retrieve a preset encode
configuration as described in Section 3.3.2.
8. Additionally, the client should also pass the selected preset GUID through
NV_ENC_INITIALIZE_PARAMS::presetGUID. This is to allow the NVIDIA Video
Encoder interface to program internal parameters associated with the encoding
session to ensure that the encoded output conforms to the client’s request. Note that
passing the preset GUID here will not override the fine-tuned parameters.
The client is required to explicitly specify the following while initializing the Encode
Session:
Asynchronous mode encoding is only supported on Windows 7 and later. Refer to Chapter 6
for more detailed explanation.
If the client wants to send the input buffers in encode order, it must set enablePTD = 0,
and must specify
NV_ENC_PIC_PARAMS::pictureType
NV_ENC_PIC_PARAMS_H264/NV_ENC_PIC_PARAMS_HEVC::displayPOCSyntax
NV_ENC_PIC_PARAMS_H264/NV_ENC_PIC_PARAMS_HEVC::refPicFlag
The client may choose to allocate input buffers through NVIDIA Video Encoder
Interface by calling NvEncCreateInputBuffer API. In this case, the client is responsible
to destroy the allocated input buffers before closing the encode session. It is also the
client’s responsibility to fill the input buffer with valid input data according to the
chosen input buffer format.
The client should allocate buffers to hold the output encoded bit stream using the
NvEncCreateBitstreamBuffer API. It is the client’s responsibility to destroy these
buffers before closing the encode session.
Alternatively, in scenarios where the client cannot or does not want to allocate input
buffers through the NVIDIA Video Encoder Interface, it can use any externally allocated
DirectX resource as an input buffer. However, the client has to perform some simple
processing to map these resources to resource handles that are recognized by the
NVIDIA Video Encoder Interface before use. The translation procedure is explained in
Section 4.1.2.
If the client has used a CUDA device to initialize the encoder session, and wishes to use
input buffers NOT allocated through the NVIDIA Video Encoder Interface, the client is
required to use buffers allocated using the cuMemAlloc family of APIs. The NVIDIA
Video Encoder Interface version 7.0 only supports CUdevicePtr as a supported input
format. Support for CUarray inputs will be added in future versions.
Note: The client should allocate at least (1 + NB) input and output buffers, where NB is
the number of B frames between successive P frames.
By default, SPS/PPS data will be attached to every IDR frame. However, the client can
request the encoder to generate SPS/PPS data on demand as well. To accomplish this, set
NV_ENC_PIC_PARAMS::encodePicFlags = NV_ENC_PIC_FLAG_OUTPUT_SPSPPS. The
output frame generated for the current input will then include SPS/PPS.
The client can call NvEncGetSequenceParams at any time, after the encoder has been
initialized (NvEncInitializeEncoder) and the session is active.
Once the encode session is configured and input/output buffers are allocated, the client
can start streaming the input data for encoding. The client is required to pass a handle to
a valid input buffer and a valid bit stream (output) buffer to the NVIDIA Video Encoder
Interface for encoding an input picture.
The input picture data will be taken from the specified input buffer, and the encoded bit
stream will be available in the specified bit stream (output) buffer once the encoding
process completes.
Codec-agnostic parameters such as timestamp, duration, input buffer pointer, etc. are
passed via the structure NV_ENC_PIC_PARAMS while codec-specific parameters are
passed via the structure NV_ENC_PIC_PARAMS_H264/NV_ENC_PIC_PARAMS_HEVC
depending upon the codec in use.
The client should specify the codec-specific structure in NV_ENC_PIC_PARAMS using the
NV_ENC_PIC_PARAMS::codecPicParams member.
The CPU pointer will remain valid until the client calls NvUnlockBitstreamBuffer. The
client should call NvUnlockBitstreamBuffer after it completes processing the output
data.
The client must ensure that all bit stream buffers are unlocked before destroying/de-
allocating them (e.g. while closing an encode session) or even before reusing them again
as output buffers for subsequent frames.
EOS notification effectively flushes the encoder. This can be called multiple times in a
single encode session. This operation however must be done before closing the encode
session.
It must also ensure that all registered events are unregistered, and all mapped input
buffer handles are unmapped.
The NVIDIA Video Encoder Interface supports the following two modes of operation.
1. When working in asynchronous mode, the output sample must consist of an event +
output buffer and clients must work in multi-threaded manner (D3D9 device should
be created with MULTITHREADED flag).
2. The output buffers are allocated using NvEncCreateBitstreamBuffer API. The
NVIDIA Video Encoder Interface will return an opaque pointer to the output
memory in NV_ENC_CREATE_BITSTREAM_BUFFER::bitstreambuffer. This opaque
output pointer should be used in NvEncEncodePicture and NvEncLockBitsteam/
NvEncUnlockBitsteam calls. For accessing the output memory using CPU, client
must call NvEncLockBitsteam API. The number of IO buffers should be at least 4 +
number of B frames.
3. The events are windows event handles allocated using Windows’ CreateEvent API
and registered using the function NvEncRegisterAsyncEvent before encoding. The
registering of events is required only once per encoding session. Clients must
unregister the events using NvEncUnregisterAsyncEvent before destroying the
event handles. The number of event handles must be same as number of output
buffers as each output buffer is associated with an event.
4. Client must create a secondary thread in which it can wait on the completion event
and copy the bitstream data from the output sample. Client will have two threads:
one is the main application thread which submits encoding work to NVIDIA
Encoder while secondary thread waits on the completion events and copies the
compressed bitstream data from the output buffer.
5. Client must send the output buffer and event in
NV_ENC_PIC_PARAMS::outputBitstream andNV_ENC_PIC_PARAMS::
completionEvent fields respectively as part of NvEncEncodePicture API call.
6. Client should then wait on the event on the secondary thread in the same order in
which it has called NvEncEncodePicture calls irrespective of input buffer re-
ordering (encode order != display order). NVIDIA Encoder takes care of the
reordering in case of B frames and should be transparent to the encoder clients.
7. When the event gets signalled client must send down the output buffer of sample
event it was waiting on in NV_ENC_LOCK_BITSTREAM::outputBitstream field as
part of NvEncLockBitstream call.
8. The NVIDIA Encoder Interface returns a CPU pointer and bitstream size in bytes as
part of the NV_ENC_LOCK_BITSTREAM.
9. After copying the bitstream data, client must call NvEncUnlockBitstream for the
locked output bitstream buffer.
1. The client will receive the event's signal and output buffer in the same order in
which they were queued.
2. The NV_ENC_LOCK_BITSTREAM::pictureType notifies the output picture type to the
clients.
3. Both, the input and output sample (output buffer and the output completion event)
are free to be reused once the NVIDIA Video Encoder Interface has signalled the
event and the client has copied the data from the output buffer.
The client should avoid making any blocking calls from the main encoder processing
thread. The main encoder thread should be used only for encoder initialization and to
submit work to the HW Encoder using NvEncEncodePicture API, which is non-
blocking.
NVENC can be used as a hardware accelerator to perform motion search and generate
motion vectors and mode information. The resulting motion vectors or mode decisions
can be used, for example, in motion compensated filtering or for supporting other
codecs not fully supported by NVENC or simply as motion vector hints for a custom
encoder. The procedure to use the feature is explained below.
After input resources are created, client needs to allocate resources for the output data
by using NvEncCreateMVBuffer API.
The pointers of the input picture buffer and the reference frame buffer need to be fed to
NV_ENC_MEONLY_PARAMS::inputBuffer and
NV_ENC_MEONLY_PARAMS::referenceFrame respectively.
In order to operate in asynchronous mode, the client should create an event and pass
this event in NV_ENC_MEONLY_PARAMS::completionEvent. This event will be signaled
upon completion of motion estimation. Each output buffer should be associated with a
distinct event pointer.
For asynchronous mode client should wait for motion estimation completion signal
before reusing output buffer and application termination.
8.1 LOOK-AHEAD
Look-ahead improves the video encoder’s rate control accuracy by enabling the encoder
to buffer the specified number of frames, estimate their complexity and allocate the bits
appropriately among these frames proportional to their complexity.
1. The availability of the feature in the current hardware can be queried using
NvEncGetEncodeCaps and checking for NV_ENC_CAPS_SUPPORT_LOOKAHEAD.
5. When the feature is enabled, frames are queued up in the encoder and hence
NvEncEncodePicture will return NV_ENC_ERR_NEED_MORE_INPUT until the encoder
has sufficient number of input frames to satisfy the look-ahead requirement. Frames
should be continuously fed in until NvEncEncodePicture returns NV_ENC_SUCCESS.
8.2.1 Spatial AQ
Spatial AQ mode adjusts the QP values based on spatial characteristics of the frame.
Since the low complexity flat regions are visually more perceptible to quality differences
than high complexity detailed regions, extra bits are allocated to flat regions of the frame
at the cost of the regions having high spatial detail. Although Spatial AQ improves the
perceptible visual quality of the encoded video, the required bit redistribution results in
PSNR drop in most of the cases. Therefore, during PSNR-based evaluation, this feature
should be turned off.
8.2.2 Temporal AQ
Temporal AQ tries to adjust encoding QP (on top of QP evaluated by the Rate Control
Algorithm) based on temporal characteristics of the sequence. Temporal AQ improves
the quality of encoded frames by adjusting QP for regions which are constant or have
low motion across frames but have high spatial detail, such that they become better
reference for future frames. Allocating extra bits to such regions in reference frames is
better than allocating them to the residuals in referred frames because it helps improve
the overall encoded video quality. If majority of the region within a frame has little or no
motion, but has high spatial details (e.g. high-detail non-moving background) enabling
temporal AQ will benefit the most.
One of the potential disadvantages of temporal AQ is that enabling temporal AQ may
result in high fluctuation of bits consumed per frame within a GOP. I/P-frames will
consume more bits than average P-frame size and B-frames will consume lesser bits.
Although target bitrate will be maintained at the GOP level, the frame size will fluctuate
from one frame to next within a GOP more than it would without temporal AQ. If a
strict CBR profile is required for every frame size within a GOP, it is not recommended
to enable temporal AQ. Additionally, since some of the complexity estimation is
NVIDIA hardware video encoder is used for several purposes in various applications.
Some of the common applications include: Video-recording (archiving), game-casting
(broadcasting/multicasting video gameplay online), transcoding (live and video-on-
demand) and streaming (games or live content). Each of these use-cases has its unique
requirements for quality, bitrate, latency tolerance, performance constraints etc.
Although NVIDIA Encoder Interface provides flexibility to control the settings with a
large number of API’s, below table can be used as a general guideline for recommended
settings for some of the popular use-cases to deliver the best encoded bit stream quality.
These recommendations are particularly applicable to GPUs based on second generation
Maxwell architecture beyond. For earlier GPUs (Kepler and first generation Maxwell), it
is recommended that clients use the Table 1 as a starting point and adjust the settings to
achieve appropriate performance-quality tradeoff.
1Recommended for low motion games and natural video. It is observed that 3 B frames results in most optimal quality
2Available only in second generation Maxwell GPUs and above. Temporal AQ in general gives better quality than Spatial
AQ but is computationally complex.
HDMI
HDMI, the HDMI logo, and High-Definition Multimedia Interface are trademarks or registered trademarks of
HDMI Licensing LLC.
OpenCL
OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc.
Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and
other countries. Other company and product names may be trademarks of the respective companies with
which they are associated.
Copyright
© 2016 NVIDIA Corporation. All rights reserved.
www.nvidia.com