Skip to content

Optimize VideoCapture with cvtColor instead of sws_scale #27652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: 4.x
Choose a base branch
from

Conversation

dkurt
Copy link
Member

@dkurt dkurt commented Aug 10, 2025

Pull Request Readiness Checklist

Test code that reads every frame with FFMpeg from 1600x900 1 hour mp4 video:

  • using CPU:
    total time: 45.6 seconds -> 30.6 seconds

  • using GPU HW acceleration
    total time: 232 seconds -> 69 seconds

    HW utilization with sws_scale:

    Device 0 [NVIDIA GeForce RTX 4090] PCIe GEN 4@16x RX: 54.54 MiB/s TX: 869.8 MiB/s
    GPU 2805MHz MEM 10501MH TEMP  34°C FAN  32% POW 109 / 480 W
    GPU[||                        7%] MEM[|          0.905Gi/23.988Gi] DEC[|||   29%]
    

    HW utilization with cvtColor:

    Device 0 [NVIDIA GeForce RTX 4090] PCIe GEN 4@16x RX: 46.44 MiB/s TX: 2.885 GiB/s
    GPU 2805MHz MEM 10501MH TEMP  34°C FAN  32% POW 107 / 480 W
    GPU[||||||                   21%] MEM[|          0.905Gi/23.988Gi] DEC[|||||100%]
    
import time
import numpy as np
import os
import cv2 as cv

os.environ["OPENCV_FFMPEG_CAPTURE_OPTIONS"] = "hwaccel;cuvid|video_codec;h264_cuvid|vsync;0"

start = time.time()
cap = cv.VideoCapture("test.mp4", cv.CAP_FFMPEG)
assert(cap.isOpened())
while True:
    has_frame, frame = cap.read()
    if not has_frame:
        break
print(time.time() - start)
$ ffprobe -i test.mp4

ffprobe version n7.0.3 Copyright (c) 2007-2025 the FFmpeg developers
  built with gcc 13 (Ubuntu 13.2.0-23ubuntu4)
  configuration: --enable-nonfree --enable-cuda-nvcc --enable-libnpp --enable-nvdec --enable-swresample --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --disable-static --enable-shared
  libavutil      59.  8.100 / 59.  8.100
  libavcodec     61.  3.100 / 61.  3.100
  libavformat    61.  1.100 / 61.  1.100
  libavdevice    61.  1.100 / 61.  1.100
  libavfilter    10.  1.100 / 10.  1.100
  libswscale      8.  1.100 /  8.  1.100
  libswresample   5.  1.100 /  5.  1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.3.100
  Duration: 00:55:27.48, start: 0.000000, bitrate: 199 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1600x900, 126 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
      Metadata:
        handler_name    : VideoHandler
        vendor_id       : [0][0][0][0]
        encoder         : Lavc60.3.100 libx264
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 66 kb/s (default)
      Metadata:
        handler_name    : SoundHandler
        vendor_id       : [0][0][0][0]

resolves #21969

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Contributor

What "using CPU: total_time: 45 seconds" means?

@asmorkalov asmorkalov added this to the 4.13.0 milestone Aug 11, 2025
@asmorkalov asmorkalov self-requested a review August 11, 2025 07:52
@asmorkalov asmorkalov self-assigned this Aug 11, 2025
@dkurt
Copy link
Member Author

dkurt commented Aug 11, 2025

What "using CPU: total_time: 45 seconds" means?

Same Python script but without os.environ["OPENCV_FFMPEG_CAPTURE_OPTIONS"] = "hwaccel;cuvid|video_codec;h264_cuvid|vsync;0"

I've figured out that sws_scale for AV_PIX_FMT_YUV420P also slower than OpenCV's cvtColor (the script might work about 30 seconds which is 1.5 faster). I will update this PR to demonstrate it later.

@dkurt dkurt changed the title Use Opencv NV12 to RGB instead of sws_scale Optimize VideoCapture with OpenCV cvtColor instead of sws_scale Aug 11, 2025
@dkurt dkurt changed the title Optimize VideoCapture with OpenCV cvtColor instead of sws_scale Optimize VideoCapture with cvtColor instead of sws_scale Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

After setting the decoder to hevc_cuvid, VideoCapture reads video more slowly
2 participants