0% found this document useful (0 votes)

45 views24 pages

NVDEC VideoDecoder API ProgGuide

Uploaded by

Fire One (one)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views24 pages

NVDEC VideoDecoder API ProgGuide

Uploaded by

Fire One (one)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

NVIDIA VIDEO CODEC SDK - DECODER

Programming Guide

vNVDECODEAPI_PG-08085-001_v07 | March 2024

Table of Contents
Chapter 1. Overview..............................................................................................................1
1.1. Supported Codecs..................................................................................................................... 1
Chapter 2. Video Decoder Capabilities................................................................................ 3
Chapter 3. Video Decoder Pipeline...................................................................................... 5
Chapter 4. Using NVIDIA Video Decoder (NVDECODE API)................................................. 7
4.1. Video Parser.............................................................................................................................. 7
4.1.1. Creating a parser............................................................................................................... 7
4.1.2. Parsing the packets........................................................................................................... 9
4.1.3. Destroying parser...............................................................................................................9
4.2. Video Decoder......................................................................................................................... 10
4.2.1. Querying decode capabilities........................................................................................... 10
4.2.2. Creating a Decoder.......................................................................................................... 11
4.2.3. Decoding the frame/field................................................................................................. 13
4.2.4. Preparing the decoded frame for further processing.................................................... 14
4.2.5. Getting histogram data buffer......................................................................................... 15
4.2.6. Querying the decoding status.......................................................................................... 16
4.2.7. Reconfiguring the decoder...............................................................................................17
4.2.8. Destroying the decoder....................................................................................................17
4.3. Run-time dynamic linking of Nvidia libraries........................................................................17
4.3.1. Run-time dynamic linking................................................................................................18
4.3.2. Getting function pointers................................................................................................. 18
4.4. Writing an Efficient Decode Application................................................................................ 20

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | ii

Chapter 1. Overview

NVIDIA GPUs - beginning with the NVIDIA® Fermi™ generation - contain a video decoder
engine (referred to as NVDEC in this document) which provides fully-accelerated hardware
video decoding capability. NVDEC can be used for decoding bitstreams of various formats: AV1,
H.264, HEVC (H.265), VP8, VP9, MPEG-1, MPEG-2, MPEG-4 and VC-1. NVDEC runs completely
independent of compute/graphics engine.
NVIDIA provides software API and libraries for programming NVDEC. The software API, hereafter
referred to as NVDECODE API lets developers access the video decoding features of NVDEC and
interoperate NVDEC with other engines on the GPU.
NVDEC decodes the compressed video streams and copies the resulting YUV frames to video
memory. With frames in video memory, video post processing can be done using CUDA.
The NVDECODE API also provides CUDA-optimized implementation of commonly used post-
processing operations such as scaling, cropping, aspect ratio conversion, de-interlacing and
color space conversion to many popular output video formats. The client can choose to use the
CUDA-optimized implementations provided by the NVDECODE API for these post-processing
steps or choose to implement their own post-processing on the decoded output frames.
Decoded video frames can be presented to the display with graphics interoperability for video
playback, passed directly to a dedicated hardware encoder (NVENC) for high-performance video
transcoding, used for GPU accelerated inferencing or consumed further by CUDA or CPU-based
processing.

1.1. Supported Codecs

The codecs supported by NVDECODE API are:

‣ MPEG-1,
‣ MPEG-2,
‣ MPEG4,
‣ VC-1,
‣ H.264 (AVCHD) (8 bit),
‣ H.265 (HEVC) (8bit, 10 bit and 12 bit),
‣ VP8,
‣ VP9(8bit, 10 bit and 12 bit),

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 1

Overview

‣ AV1 Main profile,

‣ Hybrid (CUDA + CPU) JPEG
Refer to Chapter 2 for complete details about the video capabilities for various GPUs.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 2

Chapter 2. Video Decoder Capabilities

Table 1 shows the codec support and capabilities of the hardware video decoder for each GPU
architecture.

Table 1. Hardware Video Decoder Capabilities

GPU MPEG-1 & VC-1 &
H.264/AVCHD H.265/HEVC VP8 VP9 AV1
Architecture MPEG-2 MPEG-4

Maximum
Resolution:
Maximum
4096x4096
Maximum Resolution:
Fermi (GF1xx) Profile: Unsupported Unsupported Unsupported Unsupported
Resolution: 2048x1024
Baseline,
4080x4080 & 1024x2048
Main,
High profile
up to Level 4.1

Maximum
Resolution:
Maximum
Kepler Maximum 4096x4096
Resolution: Unsupported Unsupported Unsupported Unsupported
(GK1xx) Resolution: Profile:
2048x1024
4080x4080 Main,
& 1024x2048
Highprofile
up to Level4.1

Maximum
Resolution:
First Maximum 4096x4096
generation Maximum Resolution: Profile:
Resolution: 2048x1024 Baseline, Unsupported Unsupported Unsupported Unsupported
Maxwell
(GM10x) 4080x4080 & 1024x2048 Main,
High
profile up
to Level5.1

Maximum
Second Maximum Resolution:
generation Resolution: 4096x4096
Maxwell Maximum 2048x1024 Profile: Maximum
Resolution: & 1024x2048 Baseline, Unsupported Resolution: Unsupported Unsupported
(GM20x,
except 4080x4080 Main, 4096x4096
GM206) Max bitrate: High
60 Mbps profile up
to Level5.1

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 3

Video Decoder Capabilities

GPU MPEG-1 & VC-1 &

H.264/AVCHD H.265/HEVC VP8 VP9 AV1
Architecture MPEG-2 MPEG-4

Maximum Maximum
Resolution: Resolution:
Maximum 4096x4096 4096x2304 Maximum
Maximum Resolution: Profile: Profile: Maximum Resolution:
GM206 Resolution: 2048x1024 Baseline, Main Resolution: 4096x2304 Unsupported
4080x4080 & 1024x2048 Main, profile up 4096x4096 Profile:
High to Level5.1 Profile 0
profile up and main10
to Level5.1 profile

Maximum Maximum
Resolution: Resolution:
4096x4096 4096x4096 Maximum
Maximum
Maximum Profile: Profile: Maximum Resolution:
GP100 Resolution: Unsupported
Resolution: Baseline, Main Resolution: 4096x4096
2048x1024
4080x4080 Main, profile up 4096x4096 Profile:
& 1024x2048
High to Level 5.1, Profile 0
profile up main10 and
to Level 5.1 main12 profile

Maximum Maximum
Resolution: Resolution: Maximum
4096x4096 8192x8192 Resolution:
GP10x/ Maximum
Maximum Profile: Profile: Maximum 8192x8192[2]
GV100/ Resolution: Unsupported
Resolution: Baseline, Main Resolution: Profile:
Turing/GA100 2048x1024
4080x4080 Main, profile up 4096x4096[1] Profile 0, 10-
& 1024x2048
High to Level 5.1, bit and 12-
profile up main10 and bit decoding
to Level 5.1 main12 profile

Maximum Maximum
Resolution: Resolution: Maximum
4096x4096 8192x8192 Resolution:
Maximum
Maximum Profile: Profile: Maximum 8192x8192
Hopper Resolution: Unsupported
Resolution: Baseline, Main Resolution: Profile:
2048x1024
4080x4080 Main, profile up 4096x4096 Profile 0, 10-
& 1024x2048
High to Level 5.1, bit and 12-
profile up main10 and bit decoding
to Level 5.1 main12 profile

Maximum Maximum
Resolution: Resolution: Maximum
Maximum
4096x4096 8192x8192 Resolution:
Maximum Resolution:
Maximum Profile: Profile: Maximum 8192x8192
GA10x/AD10x Resolution: 8192x8192
Resolution: Baseline, Main Resolution: Profile:
2048x1024 Profile:
4080x4080 Main, profile up 4096x4096 Profile 0, 10-
& 1024x2048 Profile 0
High to Level 5.1, bit and 12-
upto level 6.0
profile up main10 and bit decoding
to Level 5.1 main12 profile

[1] Supported only on select GP10x GPUs, all Turing GPUs and GA100
[2] VP9 10-bit and 12-bit decoding is supported on select GP10x GPUs, all Turing GPUs and GA100

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 4

Chapter 3. Video Decoder Pipeline

Decoder pipeline consists of three major components - Demuxer, Video Parser, and Video
Decoder. The components are not dependent on each other and hence can be used
independently. NVDECODE API provide API’s for NVIDIA video parser and NVIDIA video decoder.
Of these, NVIDIA video parser is purely a software component and users can implement their
own parser in place of NVIDIA video parser, if required.

Figure 1. Video decoder pipeline using NVDECODE API

At a high level the following steps should be followed for decoding any video content using
NVDECODEAPI:
1. Create a CUDA context.
2. Query the decode capabilities of the hardware decoder.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 5

Video Decoder Pipeline

3. Create the decoder instance(s).

4. De-Mux the content (like .mp4). This can be done using third party software like FFMPEG.
5. Parse the video bitstream using parser provided by NVDECODE API or third-party parser
such as FFmpeg.
6. Kick off the Decoding using NVDECODE API.
7. Obtain the decoded YUV for further processing.
8. Query the status of the decoded frame.
9. Depending on the decoding status, use the decoded output for further processing like
rendering, inferencing, postprocessing etc.
10.If the application needs to display the output,

‣ Convert decoded YUV surface to RGBA.

‣ Map RGBA surface to DirectX or OpenGL texture.
‣ Draw texture to screen.
11.Destroy the decoder instance(s) after the completion of decoding process.
12.Destroy the CUDA context.
The above steps are explained in the rest of the document and demonstrated in the sample
application(s) included in the Video Codec SDK package.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 6

Chapter 4. Using NVIDIA Video Decoder
(NVDECODE API)

All NVDECODE APIs are exposed in two header-files: cuviddec.h and nvcuvid.h. These
headers can be found under Interface folder in the Video Codec SDK package. The samples in
NVIDIA Video Codec SDK statically load the library (which ships as a part of the SDK package for
windows) functions and include cuviddec.h and nvcuvid.h in the source files. The Windows
DLL nvcuvid.dll is included in the NVIDIA display driver for Windows. The Linux library
libnvcuvid.so is included with NVIDIA display driver for Linux.
The following sections in this chapter explain the flow that should be followed to accelerate
decoding using NVDECODE API.

4.1. Video Parser

4.1.1. Creating a parser
Parser object can be created by calling cuvidCreateVideoParser() after filling the structure
CUVIDPARSERPARAMS. The structure should be filled up with following information about the
stream to be decoded:

‣ CodecType: must be from enum cudaVideoCodec, indicating codec type of content like
H.264, HEVC, VP9 etc.
‣ ulMaxNumDecodeSurfaces: This is number of surfaces in parser’s DPB (decode
picture buffer). This value may not be known at the parser initialization time and
can be set to a dummy number like 1 to create parser object. Application must
register a callback pfnSequenceCallback with the driver, which is called by the
parser when the parser encounters the first sequence header or any changes in
the sequence. This callback reports the minimum number of surfaces needed by
parser’s DPB for correct decoding in CUVIDEOFORMAT::min_num_decode_surfaces.
The sequence callback may return this value to the parser if wants to
update CUVIDPARSERPARAMS::ulMaxNumDecodeSurfaces. The parser then overwrites
CUVIDPARSERPARAMS::ulMaxNumDecodeSurfaces with the value returned by the sequence
callback, if return value of the sequence callback is greater than 1 (see description
about pfnSequenceCallback below). Therefore, for optimum memory allocation, decoder
object creation should be deferred until CUVIDPARSERPARAMS::ulMaxNumDecodeSurfaces
is known, so that the decoder object can be created with required

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 7

Using NVIDIA Video Decoder (NVDECODE API)

number of buffers, such that CUVIDDECODECREATEINFO::ulNumDecodeSurfaces =

CUVIDPARSERPARAMS::ulMaxNumDecodeSurfaces.
‣ ulClockRate: is timestamp units in Hz (0=default=10000000Hz)
‣ ulErrorThreshold: controls non-compliance bitstream checks in parser. Its valid range
is 0 to 100. 0 means strict check and parser will return error if found any non-compliance or
error and 100 means ignore all non-compliance bitstream checks in parser.
‣ ulMaxDisplayDelay: Max display callback delay. 0 = no delay
‣ bAnnexb: must be set to 1 for AV1 annexB streams

‣ pfnSequenceCallback: Application must register a function to handle any sequence

change. Parser triggers this callback for initial sequence header or when it encounters a
video format change. Return value from sequence callback is interpreted by the driver as
follows:

‣ 0: fail
‣ 1: succeeded, but driver should not override
CUVIDPARSERPARAMS::ulMaxNumDecodeSurfaces
‣ >1: succeeded, and driver should override
CUVIDPARSERPARAMS::ulMaxNumDecodeSurfaces with this return value
‣ pfnDecodePicture: Parser triggers this callback when bitstream data for one frame is
ready. In case of field pictures, there may be two decode calls per one display call since two
fields make up one frame. Return value from this callback is interpreted as:

‣ 0: fail
‣ ≥1: succeeded
‣ pfnDisplayPicture: Parser triggers this callback when a frame in display order is ready.
Return value from this callback is interpreted as:

‣ 0: fail
‣ ≥1: succeeded
‣ pfnGetOperatingPoint: Parser triggers this callback to get operating point of an AV1
scalable stream. Parser picks default operating point as 0 and outputAllLayers flag as 0 if
pfnGetOperatingPoint is not set or return value is -1 or invalid operating point. Return
value from this callback is interpreted as:

‣ < 0: fail
‣ ≥0: succeeded (bit 0-9: currOperatingPoint, bit 10-10: bOutputAllLayer)
‣ pfnGetSEIMsg: Parser triggers this callback in decode order when all the unregistered user
SEI messages or Metadata OBUs are parsed for a frame. Currently this callback is supported
for H264, HEVC and AV1 codecs. Return value from this callback is interpreted as:

‣ 0: fail
‣ ≥1: succeeded

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 8

Using NVIDIA Video Decoder (NVDECODE API)

4.1.2. Parsing the packets

Bitstream extracted from demultiplexer along with its length and some other auxiliary info like
timestamp, flags is packed into struct CUVIDSOURCEDATAPACKET , called as packet. This packet
is fed into parser using cuvidParseVideoData(). This packet is initialized as:

‣ flags: These flags are set by application and interpreted by parser as below:

‣ CUVID_PKT_ENDOFSTREAM: MUST be set with last packet for this stream. Parser will
trigger display callback for all pending buffers in the display queue.
‣ CUVID_PKT_TIMESTAMP: indicate that timestamp in packet is valid.
‣ CUVID_PKT_DISCONTINUITY: should be set if there is any discontinuity like packet after
seek.
‣ CUVID_PKT_ENDOFPICTURE: MUST be set when packet contains exactly one frame
or one field data. NALU based codecs have one frame latency for decode callback as
parser detects frame boundary when some non-VCL NALU are received (that belong to
next frame). This flag will force parser to skip this boundary check and trigger decode
callback immediately. If packet has incomplete data, decode callback will get triggered
with partial frame data. If packet has more than one frame data, parser will trigger
decode callback for first frame data. Rest of the NALU will get dropped.
‣ CUVID_PKT_NOTIFY_EOS: If this flag is set along with CUVID_PKT_ENDOFSTREAM,
an additional (dummy) display callback will be invoked with null value of
CUVIDPARSERDISPINFO which should be interpreted as end of the stream.
‣ payload_size: represents number of bytes in payload
‣ payload: points to bitstream memory buffer

‣ timestamp: Presentation time stamp (10MHz clock), only valid if CUVID_PKT_TIMESTAMP

flag is set
Parser triggers callbacks registered while creating parser object synchronously from
within cuvidParseVideoData(), whenever there is corresponding condition is hit
like pfnSequenceCallback when there is change in sequence parameters or
pfnDecodePicturepicture when frame is ready to be decoded. If the callback returns failure,
it will be propagated by cuvidParseVideoData() to the application.

The decoded result gets associated with a picture-index value in the CUVIDPICPARAMS structure,
which is also provided by the parser. This picture index is later used to map the decoded frames
to CUDA memory.

4.1.3. Destroying parser

The user needs to call cuvidDestroyVideoParser() to destroy the parser object and free up
all the allocated resources.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 9

Using NVIDIA Video Decoder (NVDECODE API)

4.2. Video Decoder

4.2.1. Querying decode capabilities
The API cuvidGetDecoderCaps() lets users query the capabilities of underlying hardware video
decoder.
As illustrated in Table 1, different GPUs have hardware decoders with different capabilities.
Therefore, to ensure your application works on all generations of GPU hardware, it is highly
recommended that the application queries the hardware capabilities and makes appropriate
decision based on presence/absence of the desired capability/functionality.
The API cuvidGetDecoderCaps() lets users query the capabilities of underlying hardware video
decoder. Calling thread should have a valid CUDA context associated.
The client needs to fill in the following fields of CUVIDDECODECAPS before calling
cuvidGetDecoderCaps().

‣ eCodecType: Codec type (AV1, H.264, HEVC, VP9, JPEG etc.)

‣ eChromaFormat: 4:2:0, 4:4:4, etc.

‣ nBitDepthMinus8: 0 for 8-bit, 2 for 10-bit, 4 for 12-bit

When cuvidGetDecoderCaps() is called , the underlying driver fills up the remaining fields of
CUVIDDECODECAPS, indicating the support for the queried capabilities, supported output formats
and the maximum and minimum resolutions the hardware supports.
The following pseudo-code illustrates how to query the capabilities of NVDEC.
CUVIDDECODECAPS decodeCaps = {};
// set IN params for decodeCaps
decodeCaps.eCodecType = cudaVideoCodec_HEVC;//HEVC
decodeCaps.eChromaFormat = cudaVideoChromaFormat_420;//YUV 4:2:0
decodeCaps.nBitDepthMinus8 = 2;// 10 bit
result = cuvidGetDecoderCaps(&decodeCaps);

Returned parameters from API can be interpreted as below to validate if content can be decoded
on underlying hardware:

// Check if content is supported

if (!decodecaps.bIsSupported){
NVDEC_THROW_ERROR(Codec not supported on this GPU", CUDA_ERROR_NOT_SUPPORTED);
}
// validate the content resolution supported on underlying hardware
if ((coded_width > decodecaps.nMaxWidth) ||
(coded_height > decodecaps.nMaxHeight)){
NVDEC_THROW_ERROR(Resolution not supported on this GPU",
CUDA_ERROR_NOT_SUPPORTED);
}
// Max supported macroblock count CodedWidth*CodedHeight/256 must be <= nMaxMBCount
if ((coded_width>>4)*(coded_height>>4) > decodecaps.nMaxMBCount){
NVDEC_THROW_ERROR(MBCount not supported on this GPU",
CUDA_ERROR_NOT_SUPPORTED);
}

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 10

Using NVIDIA Video Decoder (NVDECODE API)

In most situations, bit-depth and chroma subsampling to be used at the decoder output is same
as that at the decoder input (i.e. in the content). In certain cases, however, it may be necessary
to have the decoder produce output with bit-depth and chroma subsampling different from that
used in the input bitstream. In general, it’s always a good idea to first check if the desired output
bit-depth and chroma subsampling format is supported before creating the decoder. This can
be done in the following way:

// Check supported output format

if (decodecaps.nOutputFormatMask & (1<<cudaVideoSurfaceFormat_NV12)){
// Decoder supports output surface format NV12
}
if (decodecaps.nOutputFormatMask & (1<<cudaVideoSurfaceFormat_P010){
// Decoder supports output surface format P010
}
……

The API cuvidGetDecoderCaps() also returns histogram related capabilities of underlying

GPU. Histogram data is collected by NVDEC during the decoding process resulting in zero
performance penalty. NVDEC computes the histogram data for only the luma component of
decoded output, not on post-processed frame(i.e. when scaling, cropping, etc. applied). In case
of AV1 when film gain is enabled, histogram data is collected on the decoded frame prior to the
application of the flim grain.

// Check if histogram is supported

if (decodecaps.bIsHistogramSupported){
nCounterBitDepth = decodecaps.nCounterBitDepth; // histogram counter bit depth
nMaxHistogramBins = decodecaps.nMaxHistogramBins; // Max number of histogram bins
}
……

Histogram data is calculated as : Histogram_Bin[pixel_value >> (pixel_bitDepth -

log2(nMaxHistogramBins))]++;

4.2.2. Creating a Decoder

Before creating the decoder instance, user needs to have a valid CUDA context which will be
used in the entire decoding process.
The decoder instance can be created by calling cuvidCreateDecoder() after filling the
structure CUVIDDECODECREATEINFO. The structure CUVIDDECODECREATEINFO should be filled
up with the following information about the stream to be decoded:

‣ CodecType: must be from enum cudaVideoCodec. It represents codec type of content

likeH.264, HEVC, VP9 etc.
‣ ulWidth, ulHeight: coded width and coded height in pixels.
‣ ulMaxWidth, ulMaxHeight: max width and max height that decoder support in case
of resolution change. When there is resolution change (new resolution <= ulMaxWidth,
ulMaxHeight) in video stream, app can reconfigure decoder using cuvidReconfigureDecoder()
API instead of destroy and recreate the decoder. If ulMaxWidth or ulMaxHeight is set to 0,
ulMaxWidth and ulMaxHeight are set to ulWidth and ulHeight respectively.
‣ ChromaFormat: must be from enum cudaVideoChromaFormat. It represents chroma
format of content like 4:2:0, 4:4:4, etc.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 11

Using NVIDIA Video Decoder (NVDECODE API)

‣ bitDepthMinus8: bit-depth minus 8 of video stream to be decoded like 0 for 8-bit, 2 for 10-
bit, 4 for 12-bit.
‣ ulNumDecodeSurfaces: Referred to as decode surfaces elsewhere in this document,
this is the number of surfaces that the driver will internally allocate for storing
the decoded frames. Using a higher number ensures better pipelining but increases
GPU memory consumption. For correct operation, minimum value is defined in
CUVIDEOFORMAT::min_num_decode_surfaces and can be obtained from first sequence
callback from Nvidia parser. The NVDEC engine writes decoded data to one of these surfaces.
These surfaces are not accessible by the user of NVDECODE API, but the mapping stage,
which includes decoder output format conversion, scaling, cropping etc.) use these surfaces
as input surfaces.
‣ ulNumOutputSurfaces: This is the maximum number of output surfaces that the
client will simultaneously map to decode surfaces for further processing using
cuvidMapVideoFrame(). These surfaces have postprocessed decoded output to be used
by client. The driver internally allocates the corresponding number of surfaces (referred as
output surfaces in this document). Client will have access to output surfaces. Refer to section
Preparing the decoded frame for further processing to understand the definition of map.
‣ OutputFormat: Output surface format defined as enum cudaVideoSurfaceFormat.
This output format must be one of supported format obtained in
decodecaps.nOutputFormatMask in cuvidGetDecoderCaps(). If an unsupported output
format is passed, API will fail with error CUDA_ERROR_NOT_SUPPORTED.
‣ ulTargetWidth, ulTargetHeight: This is resolution of output surfaces. For use-case
which involve no scaling, these should be set to ulWidth, ulHeight, respectively.
‣ DeinterlaceMode: This should be set to cudaVideoDeinterlaceMode_Weave
or cudaVideoDeinterlaceMode_Bob for progressive content and
cudaVideoDeinterlaceMode_Adaptive for interlaced content.
cudaVideoDeinterlaceMode_Adaptiveyields better quality but increases memory
consumption.
‣ ulCreationFlags: It is defined as enum cudaVideoCreateFlags. It is optional to explicitly
define this flag. Driver will pick appropriate mode if not defined.
‣ ulIntraDecodeOnly: Set this flag to 1 to instruct the driver that the content being decoded
contains only I/IDR frames. This helps the driver optimize memory consumption. Do not set
this flag if content has non-intra frames.
‣ enableHistogram: Set this flag to 1 to enable histogram data collection.

The cuvidCreateDecoder() call fills CUvideodecoder with the decoder handle which should
be retained till the decode session is active. The handle needs to be passed along with other
NVDECODE API calls.
The user can also specify the following parameters in the CUVIDDECODECREATEINFO to control
the final output:

‣ Scaling dimension
‣ Cropping dimension
‣ Dimension if the user wants to change the aspect ratio

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 12

Using NVIDIA Video Decoder (NVDECODE API)

The following code demonstrates the setup of decoder in case of scaling, cropping, or aspect
ratio conversion.
// Scaling. Source size is 1280x960. Scale to 1920x1080.
CUresult rResult;
unsigned int uScaleW, uScaleH;
uScaleW = 1920;
uScaleH = 1080;
...
CUVIDDECODECREATEINFO stDecodeCreateInfo;
memset(&stDecodeCreateInfo, 0, sizeof(CUVIDDECODECREATEINFO));
... // Setup the remaining structure members
stDecodeCreateInfo.ulTargetWidth = uScaleWidth;
stDecodeCreateInfo.ulTargetHeight = uScaleHeight;
rResult = cuvidCreateDecoder(&hDecoder, &stDecodeCreateInfo);
...

// Cropping. Source size is 1280x960

CUresult rResult;
unsigned int uCropL, uCropR, uCropT, uCropB;
uCropL = 30;
uCropR = 700;
uCropT = 20;
uCropB = 500;
...
CUVIDDECODECREATEINFO stDecodeCreateInfo;
memset(&stDecodeCreateInfo, 0, sizeof(CUVIDDECODECREATEINFO));
// Setup the remaining structure members
...
stDecodeCreateInfo.display_area.left = uCropL;
stDecodeCreateInfo.display_area.right = uCropR;
stDecodeCreateInfo.display_area.top = uCropT;
stDecodeCreateInfo.display_are.bottom = uCropB;
rResult = cuvidCreateDecoder(&hDecoder, &stDecodeCreateInfo);
...

// Aspect Ratio Conversion. Source size is 1280x960(4:3). Convert to

// 16:9
CUresult rResult;
unsigned int uCropL, uCropR, uCropT, uCropB;
uDispAR_L = 0;
uDispAR_R = 1280;
uDispAR_T = 70;
uDispAR_B = 790;
...
CUVIDDECODECREATEINFO stDecodeCreateInfo;
memset(&stDecodeCreateInfo, 0, sizeof(CUVIDDECODECREATEINFO));
... // setup structure members
stDecodeCreateInfo.target_rect.left = uDispAR_L;
stDecodeCreateInfo.target_rect.right = uDispAR_R;
stDecodeCreateInfo.target_rect.top = uDispAR_T;
stDecodeCreateInfo.target_rect.bottom = uDispAR_B;
reResult = cuvidCreateDecoder(&hDecoder, &stDecodeCreateInfo);
...

4.2.3. Decoding the frame/field

After de-muxing and parsing, the client can submit the bitstream which contains a frame or field
of data to hardware for decoding. To accomplish this the following steps, need to be followed:

‣ Fill up the CUVIDPICPARAMS structure.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 13

Using NVIDIA Video Decoder (NVDECODE API)

‣ The client needs to fill up the structure with parameters derived during the parsing
process. CUVIDPICPARAMS contains a structure specific to every supported codec which
should also be filled up.
‣ Call cuvidDecodePicture() and pass the decoder handle and the pointer to
CUVIDPICPARAMS. cuvidDecodePicture() kicks off the decoding on NVDEC.

4.2.4. Preparing the decoded frame for further

processing
The user needs to call cuvidMapVideoFrame() to get the CUDA device pointer and pitch of the
output surface that holds the decoded and post-processed frame.
Please note that cuvidDecodePicture() instructs the NVDEC hardware engine to kick off
the decoding of the frame/field. However, successful completion of cuvidMapVideoFrame()
indicates that the decoding process is completed and that the decoded YUV frame
is converted from the format generated by NVDEC to the YUV format specified in
CUVIDDECODECREATEINFO::OutputFormat.
cuvidMapVideoFrame() API takes decode surface index (nPicIdx) as input and maps it to one
of available output surfaces, post-processes the decoded frame and copy to output surface and
returns CUDA device pointer and associated pitch of the output surfaces.
The above operation performed by cuvidMapVideoFrame() is referred to as mapping in this
document.
After the user is done with the processing on the frame, cuvidUnmapVideoFrame() must be
called to make the output surface available for storing other decoded and post-processed
frames.
If the user continuously fails to call the corresponding cuvidUnmapVideoFrame()
after cuvidMapVideoFrame(), then cuvidMapVideoFrame() will eventually fail. At most
CUVIDDECODECREATEINFO::ulNumOutputSurfaces frames can be mapped at a time.
cuvidMapVideoFrame() is a blocking call as it waits for decoding to complete. If
cuvidMapVideoFrame() is called on same CPU thread as cuvidDecodePicture(), it will
block cuvidDecodePicture() as well. In this case, the application will not be able to submit
decode packets to NVDEC until mapping is complete. It can be avoided by performing the
mapping operation on a CPU thread (referred as mapping thread) different from the one calling
cuvidDecodePicture() (referred as decoding thread).
When using NVIDIA parser from NVDECODE API, the application can implement a producer-
consumer queue between decoding thread (as producer) and mapping thread (as consumer).
The queue can contain picture indexes (or other unique identifiers) for frames being decoded.
Parser can run on decoding thread. Decoding thread can add the picture index to the queue in
display callback and return immediately from callback to continue decoding subsequent frames
as they become available. On the other side, mapping thread will monitor the queue. If it sees
the queue has non-zero length, it will dequeue the entry and call cuvidMapVideoFrame(…) with
nPicIdx as the picture index. Decoding thread must ensure to not reuse the corresponding
decode picture buffer for storing the decoded output until its entry is consumed and freed by
mapping thread.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 14

Using NVIDIA Video Decoder (NVDECODE API)

The following code demonstrates how to use cuvidMapVideoFrame() and

cuvidUnmapVideoFrame().
// MapFrame: Call cuvidMapVideoFrame and get the devptr and associated
// pitch. Copy this surface (in device memory) to host memory using
// CUDA device to host memcpy.
bool MapFrame()
{
CUVIDPARSEDISPINFO stDispInfo;
CUVIDPROCPARAMS stProcParams;
CUresult rResult;
unsigned long long cuDevPtr = 0;
int nPitch, nPicIdx, frameSize;
unsigned char* pHostPtr = nullptr;
memset(&stDispInfo, 0, sizeof(CUVIDPARSEDISPINFO));
memset(&stProcParams, 0, sizeof(CUVIDPROCPARAMS));
/*************************************************
* setup stProcParams
**************************************************/
// retrieve the frames from the Frame Display Queue. This Queue is
// is populated in HandlePictureDisplay.
if (g_pFrameQueue->dequeue(&stDispInfo))
{
nPicIdx = stDispInfo.picture_index;
rResult = cuvidMapVideoFrame(&hDecoder, nPicIdx, &cuDevPtr,
&nPitch, &stProcParams);
frameSize = (ChromaFormat == cudaVideoChromaFormat_444) ? nPitch * (3*nheight) :
nPitch * (nheight + (nheight + 1) / 2);
// use CUDA based Device to Host memcpy
rResult = cuMemAllocHost((void** )&pHostPtr, frameSize);
if (pHostPtr)
{
rResult = cuMemcpyDtoH(pHostPtr, cuDevPtr, frameSize);
}
rResult = cuvidUnmapVideoFrame(&hDecoder, cuDevPtr);
}
... // Dump YUV to a file
if (pHostPtr)
{
cuMemFreeHost(pHostPtr);
}
...
}

In multi-instance decoding use-case, NVDEC could be bottleneck so there wouldn’t be

significant benefit of calling cuvidMapVideoFrame() and cuvidDecodePicture() on different
CPU threads. cuvidDecodePicture() will stall if wait queue on NVDEC inside driver is full.
Sample applications in Video Codec SDK are using mapping and decode calls on same CPU
thread, for simplicity.

4.2.5. Getting histogram data buffer

Histogram data is collected by NVDEC during the decoding process resulting in zero
performance penalty. NVDEC computes the histogram data for only the luma component of
decoded output, not on post-processed frame(i.e. when scaling, cropping, etc. applied). In case
of AV1 when film gain is enabled, histogram data is collected on the decoded frame prior to the
application of the flim grain.
cuvidMapVideoFrame() API returns the CUDA device pointer of histogram data buffer along
with output surface if CUVIDDECODECREATEINFO::enableHistogram flag is set while creating

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 15

Using NVIDIA Video Decoder (NVDECODE API)

decoder (using API cuvidCreateDecoder()). CUDA device pointer of histogram buffer can be
obtained from CUVIDPROCPARAMS::histogram_dptr.
Histogram buffer is mapped to output buffer in driver so cuvidUnmapVideoFrame() does unmap
of histogram buffer also along with output surface.
The following code demonstrates how to use cuvidMapVideoFrame() and
cuvidUnmapVideoFrame() for accessing histogram buffer.
// MapFrame: Call cuvidMapVideoFrame and get the output frame and associated
// histogram buffer CUDA device pointer

CUVIDPROCPARAMS stProcParams;
CUresult rResult;
unsigned long long cuOutputFramePtr = 0, cuHistogramPtr = 0;
int nPitch;
int histogram_size = (decodecaps.nCounterBitDepth / 8) *
decodecaps.nMaxHistogramBins;
unsigned char *pHistogramPtr = nullptr;

memset(&stProcParams, 0, sizeof(CUVIDPROCPARAMS));
/*************************************************
* setup stProcParams
**************************************************/
stProcParams.histogram_dptr = &cuHistogramPtr;

rResult = cuvidMapVideoFrame(&hDecoder, nPicIdx, &cuOutputFramePtr,

&nPitch, &stProcParams);
// allocate histogram buffer for cuMemcpy
rResult = cuMemAllocHost((void** )&pHistogramPtr, histogram_size);
if (pHistogramPtr)
{
rResult = cuMemcpyDtoH(pHistogramPtr, cuHistogramPtr, histogram_size);
}
// unmap output frame
rResult = cuvidUnmapVideoFrame(&hDecoder, cuOutputFramePtr);
...
}

4.2.6. Querying the decoding status

After the decoding is kicked off, cuvidGetDecodeStatus() can be called at any time to query
the status of decoding of that frame. The underlying driver fills the status of decoding in
CUVIDGETDECODESTATUS::*pDecodeStatus.
The NVDECODEAPI currently reports the following statuses:

‣ Decoding is in progress.
‣ Decoding of the frame completed successfully.
‣ The bitstream for the frame was corrupted and concealed by NVDEC.
‣ The bitstream for the frame was corrupted, however could not be concealed by NVDEC.

The API is expected to help in the scenarios where the client needs to take a further decision
based on the decoding status of the frame, for e.g. whether to carry out inferencing on the frame
or not.
Please note that the NVDEC can detect a limited number of errors depending on the codec. This
API is supported for HEVC, H264 and JPEG on Maxwell and above generation GPUs.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 16

Using NVIDIA Video Decoder (NVDECODE API)

4.2.7. Reconfiguring the decoder

Using cuvidReconfigureDecoder() the user can reconfigure the decoder if there is a change
in the resolution and/or post processing parameters of the bitstream without having to destroy
the ongoing decoder instance, and create a new one thereby saving time (and latency) in the
process.
In the earlier SDKs the user had to destroy the existing decoder instance and create a new
decoder instance for handling any change in decoder resolution or post processing parameters
(like scaling ratio, cropping dimensions etc.).
The API can be used in scenarios where the bitstream undergoes changes in resolution, for e.g.
when the encoder (on server side) changes image resolution frequently to adhere to Quality of
Service(QoS) constraints.
The following steps need to be followed for using the cuvidReconfigureDecoder().
1. The user needs to specify CUVIDDECODECREATEINFO::ulMaxWidth and
CUVIDDECODECREATEINFO::ulMaxHeight while calling cuvidCreateDecoder(). The
user should choose the values of CUVIDDECODECREATEINFO::ulMaxWidth and
CUVIDDECODECREATEINFO::ulMaxHeight which to ensure that the resolution of the
bitstream is never exceeded during the entire decoding process. Please note that the values
of CUVIDDECODECREATEINFO::ulMaxWidth and CUVIDDECODECREATEINFO::ulMaxHeight
cannot be changed within a session and if user wants to change the values, the decoding
session should be destroyed and recreated.
2. During the process of decoding, when the user needs to change the bitstream or
change postprocessing parameters, the user needs to call cuvidReconfigureDecoder().
This call should be ideally made from CUVIDPARSERPARAMS::pfnSequenceCallback
when the bitstream changes. The parameters the user wants to reconfigure
should be filled up in ::CUVIDRECONFIGUREDECODERINFO. Please note,
CUVIDRECONFIGUREDECODERINFO::ulWidth and
CUVIDRECONFIGUREDECODERINFO::ulHeight must be equal to or smaller than
CUVIDDECODECREATEINFO::ulMaxWidth and CUVIDDECODECREATEINFO::ulMaxHeight
respectively or else the cuvidReconfigureDecoder()would fail.
The API is supported for all codecs supported by NVDECODEAPI.

4.2.8. Destroying the decoder

The user needs to call cuvidDestroyDecoder() to destroy the decoder session and free up all
the allocated decoder resources.

4.3. Run-time dynamic linking of Nvidia

libraries
Video Codec SDK sample applications are using two main Nvidia libraries: nvcuvid and cuda.
Both libraries can be used as either load-time dynamic linking or run-time dynamic linking.
Video Codec SDK sample applications are using load-time dynamic linking. User can use run-

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 17

Using NVIDIA Video Decoder (NVDECODE API)

time dynamic linking of these libraries if needed. Below code snippets can help understand the
changes needed in programming style:

4.3.1. Run-time dynamic linking

In case of run-time dynamic linking, library is loaded to memory at run-time. Below is code
snippet to dynamically load nvcuvid library at run-time on Windows and Linux systems:

#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)

#include <Windows.h>

#ifdef UNICODE
static LPCWSTR __DriverLibName = L"nvcuvid.dll";
#else
static LPCSTR __DriverLibName = "nvcuvid.dll";
#endif

typedef HMODULE DLLDRIVER;

static CUresult LOAD_LIBRARY(DLLDRIVER *pInstance)

{
*pInstance = LoadLibrary(__DriverLibName);

if (*pInstance == NULL)
{
printf("LoadLibrary \"%s\" failed!\n", __DriverLibName);
return CUDA_ERROR_UNKNOWN;
}

return CUDA_SUCCESS;
}

#elif defined(unix) || defined(APPLE) || defined(__MACOSX)

#include <dlfcn.h>

static char __DriverLibName[] = "libnvcuvid.so";

typedef void *DLLDRIVER;

static CUresult LOAD_LIBRARY(DLLDRIVER *pInstance)

{
*pInstance = dlopen(__DriverLibName, RTLD_NOW);

if (*pInstance == NULL)
{
printf("dlopen \"%s\" failed!\n", __DriverLibName);
return CUDA_ERROR_UNKNOWN;
}

return CUDA_SUCCESS;
}
#endif

4.3.2. Getting function pointers

Function pointers can be fetched using GetProcAddress() on Windows and dlsym() on Linux:

typedef CUresult CUDAAPI tcuvidCreateVideoParser(CUvideoparser *pObj,

CUVIDPARSERPARAMS *pParams);

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 18

Using NVIDIA Video Decoder (NVDECODE API)

typedef CUresult CUDAAPI tcuvidParseVideoData(CUvideoparser obj, CUVIDSOURCEDATAPACKET

*pPacket);
typedef CUresult CUDAAPI tcuvidDestroyVideoParser(CUvideoparser obj);

typedef CUresult CUDAAPI tcuvidGetDecoderCaps(CUVIDDECODECAPS *pdc);

typedef CUresult CUDAAPI tcuvidCreateDecoder(CUvideodecoder *phDecoder,
CUVIDDECODECREATEINFO *pdci);
typedef CUresult CUDAAPI tcuvidDestroyDecoder(CUvideodecoder hDecoder);
typedef CUresult CUDAAPI tcuvidDecodePicture(CUvideodecoder hDecoder, CUVIDPICPARAMS
*pPicParams);

tcuvidCreateVideoParser *cuvidCreateVideoParser;
tcuvidParseVideoData *cuvidParseVideoData;
tcuvidDestroyVideoParser *cuvidDestroyVideoParser;

tcuvidGetDecoderCaps *cuvidGetDecoderCaps;
tcuvidCreateDecoder *cuvidCreateDecoder;
tcuvidDestroyDecoder *cuvidDestroyDecoder;
tcuvidDecodePicture *cuvidDecodePicture;

#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)

#define GET_PROC_EX(name, alias, required) \
alias = (t##name *)GetProcAddress(DriverLib, #name); \
if (alias == NULL && required) { \
printf("Failed to find required function \"%s\" in %s\n", \
#name, __DriverLibName); \
return CUDA_ERROR_UNKNOWN; \
}

#elif defined(unix) || defined(APPLE) || defined(__MACOSX)

#define GET_PROC_EX(name, alias, required) \
alias = (t##name *)dlsym(DriverLib, #name); \
if (alias == NULL && required) { \
printf("Failed to find required function \"%s\" in %s\n", \
#name, __DriverLibName); \
return CUDA_ERROR_UNKNOWN; \
}
#endif

#define GET_PROC_REQUIRED(name) GET_PROC_EX(name,name,1)

#define GET_PROC_OPTIONAL(name) GET_PROC_EX(name,name,0)
#define GET_PROC(name) GET_PROC_REQUIRED(name)

#define CHECKED_CALL(call) \
do { \
CUresult result = (call); \
if (CUDA_SUCCESS != result) { \
return result; \
} \
} while(0)

CUresult CUDAAPI cuvidInit(unsigned int Flags)

{
DLLDRIVER DriverLib;

CHECKED_CALL(LOAD_LIBRARY(&DriverLib));

// fetch all function pointers

GET_PROC(cuvidCreateVideoParser);
GET_PROC(cuvidParseVideoData);
GET_PROC(cuvidDestroyVideoParser);

GET_PROC(cuvidGetDecoderCaps);
GET_PROC(cuvidCreateDecoder);
GET_PROC(cuvidDestroyDecoder);
GET_PROC(cuvidDecodePicture);

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 19

Using NVIDIA Video Decoder (NVDECODE API)

// fetch other functions pointers

return CUDA_SUCCESS;
}

4.4. Writing an Efficient Decode

Application
The NVDEC engine on NVIDIA GPUs is a dedicated hardware block, which decodes the input
video bitstream in supported formats. A typical video decode application broadly consists of the
following stages:
1. De-Muxing
2. Video bitstream parsing and decoding
3. Preparing the frame for further processing
Of these, de-muxing and parsing are not hardware accelerated and therefore outside the scope
of this document. The de-muxing can be performed using third party components such as
FFmpeg, which provides support for many multiplexed video formats. The sample applications
included in the SDK demonstrate de-muxing using FFmpeg.
Similarly, post-decode or video post-processing (such as scaling, color space conversion, noise
reduction, color enhancement etc.) can be effectively performed using user-defined CUDA
kernels.
The post-processed frames can then be sent to the display engine for displaying on the screen,
if required. Note that this operation is outside the scope of NVDECODE APIs.
An optimized implementation should use independent threads for de-muxing, parsing, bitstream
decode and processing etc. as explained below:
1. De-muxing: This thread demultiplexes the media file and makes the raw bit-stream available
for parser to consume.
2. Parsing and decoding: This thread does the parsing of the bitstream and kicks off decoding
by calling cuvidDecodePicture().
3. Mapping and making the frame available for further processing: This thread checks if there
are any decoded frames available. If yes, then it should call cuvidMapVideoFrame() to get
the CUDA device pointer and pitch of the frame. The frame can then be used for further
processing.
The NVDEC driver internally maintains a queue of 4 frames for efficient pipelining of operations.
Please note that this pipeline does not imply any decoding delay for decoding. The decoding
starts as soon as the first frame is queued, but the application can continue queuing up input
frames so long as space is available without stalling. Typically, by the time application has
queued 2-3 frames, decoding of the first frame is complete and the pipeline continues. This
pipeline ensures that the hardware decoder is utilized to the maximum extent possible.
For performance intensive and low latency video codec applications, ensure the PCIE link width
is set to the maximum available value. PCIE link width currently configuredcan be obtained

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 20

Using NVIDIA Video Decoder (NVDECODE API)

by running command 'nvidia-smi -q'. PCIE link width can be configured in the system's BIOS
settings.
In the use cases where there is frequent change of decode resolution and/or post processing
parameters, it is recommended to use cuvidReconfigureDecoder() instead of destroying
the existing decoder instance and recreating a new one.
The following steps should be followed for optimizing video memory usage:
1. Make CUVIDDECODECREATEINFO::ulNumDecodeSurfaces = CUVIDEOFORMAT::
min_num_decode_surfaces. This will ensure that the underlying driver allocates
minimum number of decode surfaces to correctly decode the sequence. In
case there is reduction in decoder performance, clients can slightly increase
CUVIDDECODECREATEINFO::ulNumDecodeSurfaces. It is therefore recommended to
choose the optimal value of CUVIDDECODECREATEINFO::ulNumDecodeSurfaces to ensure
right balance between decoder throughput and memory consumption.
2. CUVIDDECODECREATEINFO::ulNumOutputSurfaces should be decided optimally after due
experimentation for balancing decoder throughput and memory consumption.
3. CUVIDDECODECREATEINFO::DeinterlaceMode should be set
“cudaVideoDeinterlaceMode::cudaVideoDeinterlaceMode_Weave” or
“cudaVideoDeinterlaceMode::cudaVideoDeinterlaceMode_Bob”. For interlaced
contents, choosing
cudaVideoDeinterlaceMode::cudaVideoDeinterlaceMode_Adaptive results to
higher quality but increases memory consumption. Using
cudaVideoDeinterlaceMode::cudaVideoDeinterlaceMode_Weave or
cudaVideoDeinterlaceMode::cudaVideoDeinterlaceMode_Bob results to minimum
memory consumption though it may result in lesser video
quality. In case “CUVIDDECODECREATEINFO::DeinterlaceMode” is not
specified by the client, the underlying display driver sets it to
“cudaVideoDeinterlaceMode::cudaVideoDeinterlaceMode_Adaptive” which results
to higher memory consumption. Hence it is strongly recommended to choose the right value
of CUVIDDECODECREATEINFO::DeinterlaceMode depending on the requirement.
4. While decoding multiple streams it is recommended to allocate minimum number of CUDA
contexts and share it across sessions. This saves the memory overhead associated with the
CUDA context creation.
5. CUVIDDECODECREATEINFO::ulIntraDecodeOnly should be set to 1 if it is known
beforehand that the sequence contains Intra frames only. This feature is supported only for
HEVC, H.264 and VP9. However, decoding might fail if the flag is enabled in case of supported
codecs for regular bit streams having P and/or B frames.
The sample applications included with the Video Codec SDK are written to demonstrate the
functionality of various APIs, but they may not be fully optimized. Hence programmers are
strongly encouraged to ensure that their application is well-designed, with various stages in
the decode-postprocess-display pipeline structured in an efficient manner to achieve desired
performance and memory consumption.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 21

Notice
This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA
Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this
document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any
infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material
(defined below), code, or functionality.

NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.

Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.

NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgment, unless otherwise agreed in
an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any
customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed
either directly or indirectly by this document.

NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications
where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA
accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.

NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product
is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document,
ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of
the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional
or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem
which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.

Trademarks
NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, CUDA Toolkit, cuDNN, DALI, DIGITS, DGX, DGX-1, DGX-2, DGX Station, DLProf, GPU, Jetson, Kepler, Maxwell, NCCL,
Nsight Compute, Nsight Systems, NVCaffe, NVIDIA Deep Learning SDK, NVIDIA Developer Program, NVIDIA GPU Cloud, NVLink, NVSHMEM, PerfWorks, Pascal,
SDK Manager, Tegra, TensorRT, TensorRT Inference Server, Tesla, TF-TRT, Triton Inference Server, Turing, and Volta are trademarks and/or registered trademarks
of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which
they are associated.

NVIDIA Corporation | 2788 San Tomas Expressway, Santa Clara, CA 95051

http://www.nvidia.com

Install SH
No ratings yet
Install SH
4 pages
Unlock Phone by IMEI Online - Mobile Unlocked US
0% (2)
Unlock Phone by IMEI Online - Mobile Unlocked US
1 page
NVIDIA Video Codec SDK 6.0 High-Performance-Video PDF
No ratings yet
NVIDIA Video Codec SDK 6.0 High-Performance-Video PDF
34 pages
NVENC VideoEncoder API ProgGuide
No ratings yet
NVENC VideoEncoder API ProgGuide
57 pages
NVENC VideoEncoder API ProgGuide
No ratings yet
NVENC VideoEncoder API ProgGuide
37 pages
NVENC DA-06209-001 v08
No ratings yet
NVENC DA-06209-001 v08
11 pages
What Is A Video Encoder and Decoder
No ratings yet
What Is A Video Encoder and Decoder
4 pages
Using FFmpeg With NVIDIA GPU Hardware Acceleration
No ratings yet
Using FFmpeg With NVIDIA GPU Hardware Acceleration
22 pages
NVENC Application Note
No ratings yet
NVENC Application Note
10 pages
NVIDIA Video Codec SDK Samples Guide
No ratings yet
NVIDIA Video Codec SDK Samples Guide
13 pages
Zbook - Using Ffmpeg With Nvidia Gpu H - 5dbe
No ratings yet
Zbook - Using Ffmpeg With Nvidia Gpu H - 5dbe
18 pages
Using Ffmpeg With Nvidia Gpu Hardware Acceleration: Application Note
No ratings yet
Using Ffmpeg With Nvidia Gpu Hardware Acceleration: Application Note
20 pages
S4646 High Performance Video Encoding Gpus
No ratings yet
S4646 High Performance Video Encoding Gpus
25 pages
Radeon Pro w6000 Handbrake
No ratings yet
Radeon Pro w6000 Handbrake
2 pages
H 264
No ratings yet
H 264
8 pages
Amf Video Encode Hevc API
No ratings yet
Amf Video Encode Hevc API
37 pages
Pure Video HD Support
No ratings yet
Pure Video HD Support
2 pages
Nvidia Purevideo Decoder User'S Guide: Applications For Windows
No ratings yet
Nvidia Purevideo Decoder User'S Guide: Applications For Windows
36 pages
VLSI Design For Video Coding 2010th Edition Youn Download
No ratings yet
VLSI Design For Video Coding 2010th Edition Youn Download
48 pages
4K HEVC Video Processing With GPU Optimization On Jetson TX1
No ratings yet
4K HEVC Video Processing With GPU Optimization On Jetson TX1
1 page
NVIDIA Video Decoder
No ratings yet
NVIDIA Video Decoder
3 pages
Sim2024 Sara
No ratings yet
Sim2024 Sara
4 pages
NV Quadro DVP Jun10 Final Lo
No ratings yet
NV Quadro DVP Jun10 Final Lo
4 pages
What Is Hardware Acceleration
No ratings yet
What Is Hardware Acceleration
4 pages
Video Coding Using Graphic Processing Unit
No ratings yet
Video Coding Using Graphic Processing Unit
18 pages
Iberchip2025 Sara 1
No ratings yet
Iberchip2025 Sara 1
4 pages
OpenVideo Decode API
No ratings yet
OpenVideo Decode API
22 pages
Exercising H.264 Video Compression IP Using Commercial FPGA Prototypes
No ratings yet
Exercising H.264 Video Compression IP Using Commercial FPGA Prototypes
9 pages
Read Me
No ratings yet
Read Me
8 pages
Efficient HEVC Decoder For Heterogeneous CPU With GPU Systems
No ratings yet
Efficient HEVC Decoder For Heterogeneous CPU With GPU Systems
6 pages
Video Product Comparison
No ratings yet
Video Product Comparison
2 pages
Help Ffplay
No ratings yet
Help Ffplay
133 pages
Guide To HEVC - H.265 Encoding and Playback - TechSpot1
No ratings yet
Guide To HEVC - H.265 Encoding and Playback - TechSpot1
17 pages
Directx Video Acceleration Specification For Vp8 and Vp9 Video Coding
No ratings yet
Directx Video Acceleration Specification For Vp8 and Vp9 Video Coding
34 pages
Sound
No ratings yet
Sound
8 pages
Configuration of FFmpeg For High Stability During Encoding
No ratings yet
Configuration of FFmpeg For High Stability During Encoding
38 pages
NvEncodeAPI v.7.1
No ratings yet
NvEncodeAPI v.7.1
166 pages
Read Me
No ratings yet
Read Me
11 pages
NVAPI R343 Public SDK RelNotes v01
No ratings yet
NVAPI R343 Public SDK RelNotes v01
8 pages
80-NC839-9 H MSM8x12 MSM8x10 LA Video Overview
No ratings yet
80-NC839-9 H MSM8x12 MSM8x10 LA Video Overview
54 pages
Chapter 5
100% (1)
Chapter 5
39 pages
Video Coding Using Graphic Processing Unit
No ratings yet
Video Coding Using Graphic Processing Unit
18 pages
Deprecation Notices
No ratings yet
Deprecation Notices
5 pages
easyDCP Player User Manual
No ratings yet
easyDCP Player User Manual
85 pages
Serial Parallel Dataflow-Pipelined Processing Architecture Based Accelerator For 2D Transform-Quantization in Video Coder and Decoder
No ratings yet
Serial Parallel Dataflow-Pipelined Processing Architecture Based Accelerator For 2D Transform-Quantization in Video Coder and Decoder
12 pages
Hi3798M V200 Brief Data Sheet
No ratings yet
Hi3798M V200 Brief Data Sheet
3 pages
Konstantinos Krommydas, Christos D. Antonopoulos, Nikolaos Bellas, Wu-Chun Feng
No ratings yet
Konstantinos Krommydas, Christos D. Antonopoulos, Nikolaos Bellas, Wu-Chun Feng
4 pages
Video Graphics Array
No ratings yet
Video Graphics Array
11 pages
Ffplay Help
No ratings yet
Ffplay Help
74 pages
Mpeg PCC tm2 SW Manual
No ratings yet
Mpeg PCC tm2 SW Manual
14 pages
Gpu Companies: Intel Nvidia Amd Ati Matrox Adreno Qualcomm Powervr Imagination Technologies Mali Gpus Arm
No ratings yet
Gpu Companies: Intel Nvidia Amd Ati Matrox Adreno Qualcomm Powervr Imagination Technologies Mali Gpus Arm
8 pages
Pre-Installation Instructions:: Player
No ratings yet
Pre-Installation Instructions:: Player
4 pages
NVAPI R300 Public SDK RelNotes
No ratings yet
NVAPI R300 Public SDK RelNotes
7 pages
VideoFormat 2023 10
No ratings yet
VideoFormat 2023 10
4 pages
H264Encoder en
No ratings yet
H264Encoder en
76 pages
H.264/H.265 Video Codec Unit V1.2 Solutions Logicore Ip Product Guide (Pg252)
No ratings yet
H.264/H.265 Video Codec Unit V1.2 Solutions Logicore Ip Product Guide (Pg252)
282 pages
Video Decoding
No ratings yet
Video Decoding
3 pages
H264
No ratings yet
H264
6 pages
High Efficiency Video Coding
No ratings yet
High Efficiency Video Coding
25 pages
Complete DaVinci Resolve 20 Guide: Master Professional Video Editing, Color Correction, Audio Production and VFX
From Everand
Complete DaVinci Resolve 20 Guide: Master Professional Video Editing, Color Correction, Audio Production and VFX
Andrew M. Carter
No ratings yet
Davinci Resolve 20 Guide For Beginners: Step-By-Step Tutorial To Master Professional Video Editing, Advanced Color Grading & Post-Production
From Everand
Davinci Resolve 20 Guide For Beginners: Step-By-Step Tutorial To Master Professional Video Editing, Advanced Color Grading & Post-Production
Bryan Solara
No ratings yet
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
From Everand
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
Fouad Sabry
No ratings yet
Industrial Training
No ratings yet
Industrial Training
26 pages
Online College Fee Payment System
No ratings yet
Online College Fee Payment System
13 pages
DMEE Configuration - Step by Step Details
No ratings yet
DMEE Configuration - Step by Step Details
42 pages
Iot Based Office Automation Using Raspberry Pi
No ratings yet
Iot Based Office Automation Using Raspberry Pi
19 pages
p4 Cheat Sheet
No ratings yet
p4 Cheat Sheet
2 pages
Monkey Talk User Guide
No ratings yet
Monkey Talk User Guide
30 pages
G M Codes For Turning and Milling
No ratings yet
G M Codes For Turning and Milling
10 pages
ActiveXperts Serial Port Component - Serial Port Tool For Visual Basic Developers
No ratings yet
ActiveXperts Serial Port Component - Serial Port Tool For Visual Basic Developers
49 pages
Caats: John Tyrpak Steve Sheller Philip Andrews Ginger Bean
No ratings yet
Caats: John Tyrpak Steve Sheller Philip Andrews Ginger Bean
15 pages
PDF 24
No ratings yet
PDF 24
8 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
02 - 04 - Quality Management Strategy Sample Document
No ratings yet
02 - 04 - Quality Management Strategy Sample Document
5 pages
Tableau Tutorial
No ratings yet
Tableau Tutorial
36 pages
Basic Computer Q & A
No ratings yet
Basic Computer Q & A
4 pages
SDK Ingenico Telium Campus
No ratings yet
SDK Ingenico Telium Campus
2 pages
FortiEDR Fabric Integration Guide Rev2
No ratings yet
FortiEDR Fabric Integration Guide Rev2
15 pages
Central Superior Services MCQs Sociology
No ratings yet
Central Superior Services MCQs Sociology
3 pages
INT108 Guide For Students
No ratings yet
INT108 Guide For Students
7 pages
Commented IEEE 830-1998: o o o o o
No ratings yet
Commented IEEE 830-1998: o o o o o
3 pages
Micro Box
No ratings yet
Micro Box
2 pages
Synopsis Omkar
No ratings yet
Synopsis Omkar
6 pages
Bcr-40u Bcr-50v Bcr-220 Owner S Manual Ver 1.7
No ratings yet
Bcr-40u Bcr-50v Bcr-220 Owner S Manual Ver 1.7
28 pages
Chapter 7 ArrayList
No ratings yet
Chapter 7 ArrayList
33 pages
Upi Fraud Detection Using Machine Learning Algorithms
No ratings yet
Upi Fraud Detection Using Machine Learning Algorithms
12 pages
Fundamental UX Rule Chapter 4
No ratings yet
Fundamental UX Rule Chapter 4
31 pages
Fca - Exames
No ratings yet
Fca - Exames
11 pages
Hours I I T: Ffifut
No ratings yet
Hours I I T: Ffifut
4 pages
CV Moch Anwar Abbas
No ratings yet
CV Moch Anwar Abbas
3 pages

NVDEC VideoDecoder API ProgGuide

Uploaded by

NVDEC VideoDecoder API ProgGuide

Uploaded by

NVIDIA VIDEO CODEC SDK - DECODER

vNVDECODEAPI_PG-08085-001_v07 | March 2024

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | ii

1.1. Supported Codecs

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 1

‣ AV1 Main profile,

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 2

Table 1. Hardware Video Decoder Capabilities

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 3

GPU MPEG-1 & VC-1 &

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 4

Figure 1. Video decoder pipeline using NVDECODE API

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 5

3. Create the decoder instance(s).

‣ Convert decoded YUV surface to RGBA.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 6

4.1. Video Parser

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 7

number of buffers, such that CUVIDDECODECREATEINFO::ulNumDecodeSurfaces =

‣ pfnSequenceCallback: Application must register a function to handle any sequence

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 8

4.1.2. Parsing the packets

‣ timestamp: Presentation time stamp (10MHz clock), only valid if CUVID_PKT_TIMESTAMP

4.1.3. Destroying parser

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 9

4.2. Video Decoder

‣ eCodecType: Codec type (AV1, H.264, HEVC, VP9, JPEG etc.)

‣ eChromaFormat: 4:2:0, 4:4:4, etc.

‣ nBitDepthMinus8: 0 for 8-bit, 2 for 10-bit, 4 for 12-bit

// Check if content is supported

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 10

// Check supported output format

The API cuvidGetDecoderCaps() also returns histogram related capabilities of underlying

// Check if histogram is supported

Histogram data is calculated as : Histogram_Bin[pixel_value >> (pixel_bitDepth -

4.2.2. Creating a Decoder

‣ CodecType: must be from enum cudaVideoCodec. It represents codec type of content

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 11

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 12

// Cropping. Source size is 1280x960

// Aspect Ratio Conversion. Source size is 1280x960(4:3). Convert to

4.2.3. Decoding the frame/field

‣ Fill up the CUVIDPICPARAMS structure.

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 13

4.2.4. Preparing the decoded frame for further

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 14

The following code demonstrates how to use cuvidMapVideoFrame() and

In multi-instance decoding use-case, NVDEC could be bottleneck so there wouldn’t be

4.2.5. Getting histogram data buffer

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 15

rResult = cuvidMapVideoFrame(&hDecoder, nPicIdx, &cuOutputFramePtr,

4.2.6. Querying the decoding status

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 16

4.2.7. Reconfiguring the decoder

4.2.8. Destroying the decoder

4.3. Run-time dynamic linking of Nvidia

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 17

4.3.1. Run-time dynamic linking

#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)

typedef HMODULE DLLDRIVER;

static CUresult LOAD_LIBRARY(DLLDRIVER *pInstance)

#elif defined(__unix__) || defined(__APPLE__) || defined(__MACOSX)

static char __DriverLibName[] = "libnvcuvid.so";

typedef void *DLLDRIVER;

static CUresult LOAD_LIBRARY(DLLDRIVER *pInstance)

4.3.2. Getting function pointers

typedef CUresult CUDAAPI tcuvidCreateVideoParser(CUvideoparser *pObj,

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 18

typedef CUresult CUDAAPI tcuvidParseVideoData(CUvideoparser obj, CUVIDSOURCEDATAPACKET

typedef CUresult CUDAAPI tcuvidGetDecoderCaps(CUVIDDECODECAPS *pdc);

#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)

#elif defined(__unix__) || defined(__APPLE__) || defined(__MACOSX)

#define GET_PROC_REQUIRED(name) GET_PROC_EX(name,name,1)

CUresult CUDAAPI cuvidInit(unsigned int Flags)

// fetch all function pointers

NVIDIA VIDEO CODEC SDK - DECODER vNVDECODEAPI_PG-08085-001_v07 | 19

#elif defined(unix) || defined(APPLE) || defined(__MACOSX)

#elif defined(unix) || defined(APPLE) || defined(__MACOSX)