add audio support in cap_msmf #19721

MaximMilashchenko · 2021-03-13T09:07:21Z

Merge with extra: opencv/opencv_extra#864

@alalek, @allnes, @fzhar
Issue #16394
I attach a link on opencv_extra's PR opencv/opencv_extra#864 .

MaximMilashchenko · 2021-03-19T08:18:13Z

@alalek, I fixed the errors, but there is a warning because the patch size is exceeded. Also, the build failed on one of the Jenkins.

alalek

patch size opencv_extra: 114698 KiB

Where is the link on opencv_extra's PR?

114 Mb is very large. Existed video test data is even smaller. Test data must be reduced.

@allnes do we have an issue about Audio support? Link should be added.

PR's description should not be empty. At least it should define the scope of work, how/what should be validated, prerequisites, etc.

alalek · 2021-03-23T00:41:08Z

modules/videoio/include/opencv2/videoio/legacy/constants_c.h

@@ -315,6 +315,9 @@ enum
    CV_CAP_PROP_XI_SENSOR_FEATURE_SELECTOR                      = 585, // Selects the current feature which is accessible by XI_PRM_SENSOR_FEATURE_VALUE.
    CV_CAP_PROP_XI_SENSOR_FEATURE_VALUE                         = 586, // Allows access to sensor feature value currently selected by XI_PRM_SENSOR_FEATURE_SELECTOR.

+    //Properties of audio VideoIO
+    CV_CAP_PROP_AUDIO_ENABLE  = 1000, // Select audio or video
+    CV_CAP_PROP_BPS            = 1001, // Change bit_per_sample parametr for audio


legacy/constants_c.h

Legacy files must not be touched without critical reasons.

I changed this file to use the same style of property names inside the backend (CV_CAP_PROP_...). If I don't change the file then I need to use CAP_PROP_...

alalek · 2021-03-23T00:44:19Z

modules/videoio/include/opencv2/videoio.hpp

@@ -185,6 +185,8 @@ enum VideoCaptureProperties {
       CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata  (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499)
       CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific.
       CAP_PROP_HW_DEVICE      =51, //!< (**open-only**) Hardware device index (select GPU if multiple available)
+       CAP_PROP_AUDIO_ENABLE =1000,
+       CAP_PROP_BPS           =1001,


CAP_PROP_BPS

What does that mean in the context of VideoCapture?
"Bytes per second"? (the most incorrect case)

alalek · 2021-03-23T00:44:47Z

modules/videoio/include/opencv2/videoio.hpp

@@ -185,6 +185,8 @@ enum VideoCaptureProperties {
       CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata  (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499)
       CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific.
       CAP_PROP_HW_DEVICE      =51, //!< (**open-only**) Hardware device index (select GPU if multiple available)
+       CAP_PROP_AUDIO_ENABLE =1000,
+       CAP_PROP_BPS           =1001,


Proper documentation must be added.

alalek · 2021-03-23T01:02:48Z

modules/videoio/include/opencv2/videoio.hpp

@@ -185,6 +185,8 @@ enum VideoCaptureProperties {
       CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata  (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499)
       CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific.
       CAP_PROP_HW_DEVICE      =51, //!< (**open-only**) Hardware device index (select GPU if multiple available)
+       CAP_PROP_AUDIO_ENABLE =1000,


=1000

Why? Why is not 10000000?

I wanted the value in this file to match the values in the constants_c.h file.

alalek · 2021-03-23T01:16:28Z

modules/videoio/test/test_audio.cpp

@@ -0,0 +1,150 @@
+#include <tuple>


Mandatory requirements of OpenCV library:

proper license header

test_precomp.hpp must be included first.

alalek · 2021-03-23T02:10:08Z

modules/videoio/include/opencv2/videoio.hpp

@@ -185,6 +185,8 @@ enum VideoCaptureProperties {
       CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata  (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499)
       CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific.
       CAP_PROP_HW_DEVICE      =51, //!< (**open-only**) Hardware device index (select GPU if multiple available)
+       CAP_PROP_AUDIO_ENABLE =1000,


// Select audio or video

We should support processing of both video and audio streams. This is why we are integrating into VideoCapture instead of adding dedicated Audio API.

The next requirement is that video and audio must be synchronized in a some way (need to define these details and write them somewhere. Other "backends" and user apps should follow there guidelines).

Also we need these properties:

(Priority 0) CAP_PROP_VIDEO_STREAM - (open-only, 0-based index or -1 to disable) - Default value is 0.

(Priority 0) CAP_PROP_AUDIO_STREAM - (open-only, 0-based index or -1 to disable) - Specify stream in multi-language media files. -1 - disable audio processing (default)

(Priority 1) CAP_PROP_AUDIO_POS - (read-only, in samples) accurate audio sample timestamp of previous grabbed fragment. See CAP_PROP_AUDIO_SAMPLES_PER_SECOND.

(Priority 1) CAP_PROP_AUDIO_DATA_DEPTH - (open-only, Mat::depth()) Default value is -1: use "native" file/codec information. Alternative definition to bits-per-sample, but with clear handling of 32F / 32S.

(Priority 1) CAP_PROP_AUDIO_SAMPLES_PER_SECOND - (open-only) 0 - determine from file/codec input. If not specified through parameters or input, then selected audio sample rate is 44100. Note: resampling is not performed by OpenCV.

~~(Priority 2) CAP_PROP_AUDIO_CHANNELS - (open-only?, bitset) - to properly handle stereo or 5.1 streams. ? 0 - use average of all channels. Use -1 to grab all channels. Default value is 0 / -1 ?~~

(Priority 3) CAP_PROP_AUDIO_TOTAL_SAMPLES - (read-only) Number of samples in the file. Returns zero if information is not available (live streams).

(Priority 3) CAP_PROP_AUDIO_TOTAL_CHANNELS - (read-only) Number of audio channels in the selected audio stream.

(Priority 4) CAP_PROP_AUDIO_TOTAL_STREAMS - (read-only) Number of audio stream in the used media.

44100 has problem with 24 FPS video (1827.5 audio samples per frame). 48000 is good for 24,25,30 FPS media streams, but it has less support from codecs. See https://en.wikipedia.org/wiki/Sampling_(signal_processing)

Update 2021-04-30:

drop CAP_PROP_AUDIO_CHANNELS due to confusion with CAP_PROP_AUDIO_TOTAL_CHANNELS.

will be replaced by CAP_PROP_AUDIO_CHANNELS_RETRIEVE_MODE later (not in the scope of this PR)

TBD later

alalek · 2021-03-23T02:12:16Z

samples/cpp/videocapture_audio_combination.cpp

+    int apiID = cv::CAP_MSMF;
+    //congigurate VideoCapture for video and audio
+    VideoCapture cap_video;
+    VideoCapture cap_audio;


Wrong way.

As said above, we are integrating into VideoCapture to handle both streams instead of adding a dedicated Audio API.

samples/cpp/videocapture_audio_combination.cpp

alalek · 2021-03-23T02:16:37Z

samples/cpp/videocapture_audio_combination.cpp

+        return -1;
+    }
+    // open selected micro using selected API
+    std::vector<int> params { CAP_PROP_AUDIO_ENABLE , static_cast<int>(1) };


static_cast(1)

why?

I guided on creating a vector of parameters in a video writer (cap.cpp). There the bool cast to int, in my case it is not required. I will correct this.

alalek · 2021-03-23T02:23:24Z

samples/cpp/videocapture_audio.cpp

+    // open selected micro using selected API
+    cap.open(0, apiID, params);
+    if (!cap.isOpened()) {
+        cerr << "ERROR! Can't to open file\n";


\n

There is std::endl in C++.

asenyaev · 2021-04-08T11:28:27Z

jenkins cn please retry a build

alalek · 2021-04-28T21:49:21Z

modules/videoio/include/opencv2/videoio.hpp

+enum DeviceSatus
+{
+  NONDEVICE = -1,
+  CAMERA = 0,
+  MICROPHONE = 1
+};


Why users need that in public API?

alalek · 2021-04-28T21:50:07Z

modules/videoio/include/opencv2/videoio/legacy/constants_c.h

@@ -315,7 +315,6 @@ enum
    CV_CAP_PROP_XI_SENSOR_FEATURE_SELECTOR                      = 585, // Selects the current feature which is accessible by XI_PRM_SENSOR_FEATURE_VALUE.
    CV_CAP_PROP_XI_SENSOR_FEATURE_VALUE                         = 586, // Allows access to sensor feature value currently selected by XI_PRM_SENSOR_FEATURE_SELECTOR.

-


revert all changes in this file, including touching of empty lines.

alalek · 2021-04-28T21:51:39Z

modules/videoio/src/cap_msmf.cpp

+        res.majorType = MFMediaType_Audio;
+        res.subType = MFAudioFormat_PCM;
+        res.bit_per_sample = 32;
+        res.nChannels = 2;


I believe we should start from handling of 'Mono' audio streams

alalek · 2021-04-28T21:55:02Z

modules/videoio/src/cap_msmf.cpp

-                const double thisRateDiff = absDiff(getFramerate(), ref.getFramerate());
-                const double otherRateDiff = absDiff(other.getFramerate(), ref.getFramerate());
-                if (thisRateDiff < otherRateDiff)
+                if (width > other.width)


Please rename original function to handle "video"only data and create new one for audio and "generic".

Keep patches small, code history clear.

alalek · 2021-04-28T21:56:18Z

modules/videoio/src/cap_msmf.cpp

-        if (thisDiff < otherDiff)
-            return true;
-        if (thisDiff == otherDiff)
+        if(majorType == MFMediaType_Video)


Code format: if<space>()
if is not a function, it is C++ statement.

The same note is about for()

alalek · 2021-04-29T15:07:30Z

samples/cpp/videocapture_audio_combination.cpp

+    const int audio_base_index = cap.get(cv::CAP_PROP_AUDIO_BASE_INDEX);
+    for (;;)
+    {
+        cap.read(video_frame);


Don't lost returned results.
Show how to handle them.

alalek · 2021-04-29T15:09:04Z

modules/videoio/include/opencv2/videoio.hpp

@@ -185,11 +185,28 @@ enum VideoCaptureProperties {
       CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata  (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499)
       CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific.
       CAP_PROP_HW_DEVICE      =51, //!< (**open-only**) Hardware device index (select GPU if multiple available)
+       CAP_PROP_ENABLE_MICROPHONE = 52,


CAP_PROP_ENABLE_MICROPHONE

Why do we need dedicated property for that?

alalek · 2021-04-29T15:18:40Z

modules/videoio/include/opencv2/videoio.hpp

+       CAP_PROP_AUDIO_SAMPLES_PER_SECOND = 57,
+       CAP_PROP_AUDIO_CHANNELS = 58,
+       CAP_PROP_AUDIO_BASE_INDEX = 59,
+       CAP_PROP_AUDIO_TOTAL_CHANNELS = 60,


CAP_PROP_AUDIO_CHANNELS
CAP_PROP_AUDIO_TOTAL_CHANNELS

Documentation should be added.

alalek · 2021-04-29T15:22:39Z

modules/videoio/src/cap_msmf.cpp

@@ -951,15 +1139,14 @@ bool CvCapture_MSMF::open(const cv::String& _filename, const cv::VideoCapturePar
            }
            else
                duration = 0;
-        }
+        }        


Read OpenCV contribution guidelines on Wiki and install necessary Git hooks.

alalek · 2021-04-29T15:26:03Z

modules/videoio/src/cap_msmf.cpp

-    MediaType captureFormat;
+    MediaType captureVideoFormat;
+    MediaType captureAudioFormat;
+    bool deviceType; // false - camera, true - audio


bool deviceType; // false - camera, true - audio

... and somewhere we have recorded streams too.

This looks overcomplicated.

alalek

Updated comment with properties proposals too.

alalek · 2021-05-01T04:50:59Z

modules/videoio/src/cap_msmf.cpp

        }
        else
        {
-            cv::Mat(1, cursize, CV_8UC1, ptr, pitch).copyTo(frame);
+            //switch(index)


There is cv::extractChannel() call in OpenCV which can help to extract channel from interleaved data.

Samson-Mayeem · 2021-05-02T16:32:03Z

Can i get a common structure of the OpenCv development here, so i can know the idea forward, relative to this. i have been busy and could not follow a lot of trends on opencv. can one just make a simple summary?

alalek

All CI builds are failed due to merge conflict.
Squash commits and then rebase on upstream branch (in that order).

alalek · 2021-06-03T20:46:13Z

samples/cpp/videocapture_audio.cpp

+#include <fstream>
+#include <stdio.h>


fstream
stdio.h

Used?

alalek · 2021-06-03T20:47:31Z

samples/cpp/videocapture_audio.cpp

+    }
+
+    const int audioBaseIndex = cap.get(cv::CAP_PROP_AUDIO_BASE_INDEX);
+    const int numberOfChannels = cap.get(cv::CAP_PROP_AUDIO_TOTAL_CHANNELS);


cv::

we have using namespace cv in all samples to avoid using of cv:: prefix

alalek · 2021-06-03T20:50:28Z

samples/cpp/videocapture_audio.cpp

+                    cout << "Number of samples: " << cap.get(cv::CAP_PROP_AUDIO_POS) << endl;
+                    return 0;


How we can get here?

This is the "error" path.
.grab() should handle normal end-of-stream handling.

samples/cpp/videocapture_audio.cpp

alalek · 2021-06-03T20:51:49Z

samples/cpp/videocapture_audio_combination.cpp

+            if (audioFrame.empty() && videoFrame.empty()) 
+            {
+                cerr << "ERROR! blank frame grabbed" << endl;
+                break;


why .grab() didn't handle that?

alalek · 2021-06-03T20:54:42Z

modules/videoio/test/test_microphone.cpp

+        cap.grab();
+        cap.retrieve(frame, audio_base_index);


missing check of the returned results.

alalek · 2021-06-03T20:55:46Z

modules/videoio/test/test_audio.cpp

+    VideoCapture cap;
+};
+
+class aud : public AudioTestFixture{};


aud

What is the problem to use normal name?

alalek · 2021-06-03T20:59:00Z

modules/videoio/test/test_audio.cpp

+    void comparison()
+    {
+        for(int i = 0; i < validData.size(); i++)
+            ASSERT_TRUE(fabs(validData[i] - fileData[i]) < epsilon);            


Google test provides ASSERT_NEAR with accurate error reporting.

Also use (...) << i; to dump index at least.

alalek · 2021-06-03T21:00:31Z

modules/videoio/test/test_audio.cpp

+    void comparison()
+    {
+        for(int i = 0; i < validData.size(); i++)
+            ASSERT_TRUE(fabs(validData[i] - fileData[i]) < epsilon);            


fileData[i]

missing check for buffer overflow.

alalek · 2021-06-03T21:02:17Z

modules/videoio/test/test_audio.cpp

+        for(int j = 0; j < 44100*3; j += 44100)
+        {


why is not from 0 to 3?

ebachard · 2021-07-14T12:28:58Z

Hello,

@MaximMilashchenko : Is it possible to attach a full diff (with master) ? I'd like to test the whole changes.

Thanks in advance,
@ebachard

alalek · 2021-07-14T12:31:30Z

@ebachard Check links at page footer: "ProTip! Add .patch or .diff to the end of URLs for Git’s plaintext views."

ebachard · 2021-07-14T13:46:21Z

@alalek : thanks for the tip. In fact, in meantime I created a full diff including new created files, and I'll give it a try. First, I'll do a (personal) code review, to understand how things work, and then I'll experiment.

Thanks again !

alalek · 2021-09-01T15:23:48Z

modules/videoio/include/opencv2/videoio.hpp

+       CAP_PROP_AUDIO_BASE_INDEX = 60, //!< Number of video channels
+       CAP_PROP_AUDIO_TOTAL_CHANNELS = 61, //!< Number of audio channels in the selected audio stream.
+       CAP_PROP_AUDIO_TOTAL_STREAMS = 62, //!< Number of audio stream in the used media.
+       CAP_PROP_SYNC_LAST_FRAME = 63, //!< Defult value is 1 (the last audio frame is synchronized with the video frame by duration), 0 is no audio and video last frames sync(the last audio frame will contain all remaining audio data. The duration of the received audio data may be longer than the duration of the received video data)


typo: Default

It makes sense to keep AUDIO prefix:

CAP_PROP_SYNC_LAST_FRAME => CAP_PROP_AUDIO_SYNC_LAST_FRAME

alalek · 2021-09-01T15:28:37Z

modules/videoio/include/opencv2/videoio.hpp

+       CAP_PROP_AUDIO_POS = 57, //!< Audio position is measured in samples. Accurate audio sample timestamp of previous grabbed fragment. See CAP_PROP_AUDIO_SAMPLES_PER_SECOND
+       CAP_PROP_AUDIO_DATA_DEPTH = 58, //!< Alternative definition to bits-per-sample, but with clear handling of 32F / 32S
+       CAP_PROP_AUDIO_SAMPLES_PER_SECOND = 59, //!< determined from file/codec input. If not specified, then selected audio sample rate is 44100
+       CAP_PROP_AUDIO_BASE_INDEX = 60, //!< Number of video channels


Number of video channels

(read-only) Index of first audio channel. Used as parameter for .retrieve() call.

Perhaps we should have: CAP_PROP_VIDEO_TOTAL_CHANNELS (currently with the same value)

Agree, it's not clear why Audio_base_index sets number of video channels

I would do both. (add total and rename to VIDEO_TOTAL_CHANNELS). Also, you need to define clearly in docs that audio channel number is not zero-indexed, it continues enumeration after video channels.

Number of video channels

(read-only) Index of first audio channel. Used as parameter for .retrieve() call.

Perhaps we should have: CAP_PROP_VIDEO_TOTAL_CHANNELS (currently with the same value)

What do I have to do here: to rename CAP_PROP_AUDIO_BASE_INDEX=>CAP_PROP_VIDEO_TOTAL_CHANNELS or to add a new CAP_PROP_VIDEO_TOTAL_CHANNELS property and to change the CAP_PROP_AUDIO_BASE_INDEX description?

CAP_PROP_AUDIO_BASE_INDEX should be here with updated comment (see above).

add new property CAP_PROP_VIDEO_TOTAL_CHANNELS (for RGBD or stereo streams/cameras)

alalek · 2021-09-01T15:33:13Z

modules/videoio/src/cap_msmf.cpp

@@ -69,7 +69,7 @@ static void init_MFCreateDXGIDeviceManager()
 #endif

 #include <mferror.h>
-
+#include <fstream>


Do we use this?

alalek · 2021-09-01T15:33:56Z

modules/videoio/src/cap_msmf.cpp

    virtual bool grabFrame() CV_OVERRIDE;
+    bool retrieveAudioFrame(int, cv::OutputArray);
+    bool retrieveVideoFrame(cv::OutputArray);


cv::

Should not be used in OpenCV code.

Exception is cv::format() (to avoid problem with std::format())

cv::

Should not be used in OpenCV code.

Exception is cv::format() (to avoid problem with std::format())

In master, cv:: is used with this argument. Is it necessary to remove it?

remove from modified/added code.

No need to change unmodified parts.

alalek · 2021-09-01T15:38:04Z

modules/videoio/src/cap_msmf.cpp

+            buf = NULL;
+        }
+        audioSamples.clear();
+        int numberOfByte = (videoStream != -1) ? (int)((double)(curVideoTime/1e7)*captureAudioFormat.nSamplesPerSec*captureAudioFormat.nChannels*(captureAudioFormat.bit_per_sample)/8) : cursize;


curVideoTime/1e7

It is not accurate with integer numbers:

LONGLONG curVideoTime;

Division should be the last operation of expression (also avoid overflow - int64_t should be enough).

numberOfByte is confusing. What is it - start of data chunk or chunk's length ? Please rename (using start or length).
What is magic number 1e7 - does it come from MSFT measurements in units of 0.1us ?
(double) cast is extra here, 1e7 should be double and will autocast everything else to double.

curVideoTime/1e7

It is not accurate with integer numbers:

LONGLONG curVideoTime;

Division should be the last operation of expression (also avoid overflow - int64_t should be enough).

Mat accepts arguments with the int type. If we take int64_t numberOfByte, then the conversion of int64_t types to int will occur in the Mat constructor.

compute as int64_t

avoid doubles / floating-point computations

check range of final result and cast to "int" if needed

alalek · 2021-09-01T15:52:11Z

modules/videoio/test/test_audiocombination.cpp

+                ASSERT_TRUE(cap.retrieve(videoFrame));
+                ASSERT_TRUE(cap.retrieve(audioFrame, audioBaseIndex));


Provide useful messages about failed checks. Add number of processed frame.
There are different cases with failures on frame 0 (nothing works here), and some frame 345 (something works).

Use GoogleTest SCOPED_TRACE for that.

alalek · 2021-09-01T15:53:14Z

modules/videoio/test/test_audiocombination.cpp

+            {
+                ASSERT_TRUE(cap.retrieve(videoFrame));
+                ASSERT_TRUE(cap.retrieve(audioFrame, audioBaseIndex));
+                ASSERT_EQ(audioFrame.cols/44100, 1/fps); // check if the duration of the received audio data satisfies one video frame


Does this check may fail?

Yes. If the duration of the received audio data does not equal to one frame of the video, the check will not be passed. This is possible if synchronization does not work

Provided code doesn't look valid. Looks like it always compare 0 vs 0.

Use this instead:

int samplesPerFrame = 1/fps*44100;
// 44100 should be checked and replaced to CAP_PROP_AUDIO_SAMPLES_PER_SECOND property.

int audioSamplesTolerance = samplesPerFrame / 2;

We should have these checks:

position: abs(CAP_PROP_AUDIO_POS - CAP_PROP_POS_MSEC / 1000 * 44100) < audioSamplesTolerance

duration: abs(audioFrame.cols - samplesPerFrame) < audioSamplesTolerance

Provided code doesn't look valid. Looks like it always compare 0 vs 0.

Use this instead:

int samplesPerFrame = 1/fps*44100;
// 44100 should be checked and replaced to CAP_PROP_AUDIO_SAMPLES_PER_SECOND property.

int audioSamplesTolerance = samplesPerFrame / 2;

We should have these checks:

position: abs(CAP_PROP_AUDIO_POS - CAP_PROP_POS_MSEC / 1000 * 44100) < audioSamplesTolerance

duration: abs(audioFrame.cols - samplesPerFrame) < audioSamplesTolerance

The CAP_PROP_POS_MSEC value is not always valid

alalek · 2021-09-01T15:54:15Z

modules/videoio/test/test_audiocombination.cpp

+                ASSERT_EQ(audioFrame.cols/44100, 1/fps); // check if the duration of the received audio data satisfies one video frame
+                if (!videoFrame.empty())
+                    videoData.push_back(videoFrame);
+                for (int i = 0; i < audioFrame.cols; i++)


Add check for audioFrame.type() as code below assumes CV_16SC1.

alalek · 2021-09-01T15:55:13Z

modules/videoio/test/test_audio.cpp

+        }
+        ASSERT_FALSE(fileData.empty());
+    }
+    void comparison()


checkAudio
validateAudio

checkAudio
validateAudio

What does this mean?

suggestion to use more clear name

samples/cpp/videocapture_audio.cpp

fzhar · 2021-09-01T20:35:28Z

modules/videoio/src/cap_msmf.cpp

@@ -246,7 +299,7 @@ struct MediaType
        return wdiff + hdiff;
    }
    // check if 'this' is better than 'other' comparing to reference
-    bool isBetterThan(const MediaType& other, const MediaType& ref) const
+    bool VideoIsBetterThan(const MediaType& other, const MediaType& ref) const


Usually video quality is compared not only by resolution, but also by bitrate and FRC mode

fzhar · 2021-09-01T21:32:56Z

modules/videoio/src/cap_msmf.cpp

+            buf = NULL;
+        }
+        audioSamples.clear();
+        int numberOfByte = (videoStream != -1) ? (int)((double)(curVideoTime/1e7)*captureAudioFormat.nSamplesPerSec*captureAudioFormat.nChannels*(captureAudioFormat.bit_per_sample)/8) : cursize;


numberOfByte is confusing. What is it - start of data chunk or chunk's length ? Please rename (using start or length).
What is magic number 1e7 - does it come from MSFT measurements in units of 0.1us ?
(double) cast is extra here, 1e7 should be double and will autocast everything else to double.

fzhar · 2021-09-01T21:34:27Z

modules/videoio/src/cap_msmf.cpp

+            }
+        }
+        _ComPtr<IMFMediaBuffer> buf = NULL;
+        BYTE* useAudioData = NULL;


audioDataBuffer or audioDataInUse ? useAudioData sounds like function name or boolean flag

fzhar · 2021-09-01T21:37:14Z

modules/videoio/src/cap_msmf.cpp

+        for (int i = 0; i < numberOfByte; i++)
+        {
+            useAudioData[i] = bufferAudioData[i];
+        }


why don't we use memcpy_s or std::copy (in case of using std::dequeue instead of vector for bufferAudioData)?

fzhar · 2021-09-01T21:51:27Z

modules/videoio/src/cap_msmf.cpp

+    LONGLONG audioSamplePos;
+    DWORD numberOfAudioStreams;
+    Mat audioFrame;
+    std::vector<BYTE> bufferAudioData;


As long as this buffer will be used as queue for samples, I'd suggest to use ring queue or std::deque to speed it up. This will be noticeable on long queues only though.

fzhar · 2021-09-01T21:52:55Z

modules/videoio/src/cap_msmf.cpp

+
+            if (!SUCCEEDED(buf->Lock(&ptr, &maxsize, &cursize)))
+                break;
+            for (unsigned int i = 0; i < cursize; i++)


do .reserve first to speed it up. Or do resize and just fill the data to the tail.

samples/cpp/videocapture_audio.cpp

fzhar · 2021-09-01T22:40:06Z

samples/cpp/videocapture_audio_combination.cpp

+            }
+            if (!audioFrame.empty())
+            {
+                audioData.push_back(audioFrame);


Same as above, it would be good to do something with decoded data. As long as we have live video,you may paint oscillogram of decoded chunk over the video frame.

@fzhar Audio writer is out of scope of this PR. It would be added later.

fzhar · 2021-09-01T22:40:46Z

samples/cpp/videocapture_microphone.cpp

+            for (int nCh = 0; nCh < numberOfChannels; nCh++)
+            {
+                cap.retrieve(frame, audioBaseIndex+nCh);
+                audioData.push_back(frame);


Same as above. Writing audio to the file would be nice.

fzhar · 2021-09-01T23:28:13Z

modules/videoio/src/cap_msmf.cpp

+{
+    DWORD streamIndex,  flags;
+    HRESULT hr;
+    std::vector<bool> outInstalFlag;


Why do we need vector of flags here ? At the end they are just and-ed, mybe better use single boolean flag instead of vector ?

fzhar · 2021-09-01T23:29:47Z

modules/videoio/src/cap_msmf.cpp

+    }
+    else if (isOpen)
+    {
+        std::vector<bool> outInstalFlag;


vector is really extra here. jut use single bool

alalek · 2021-09-06T17:15:07Z

modules/videoio/src/cap_msmf.cpp

+        }
+        else
+        {
+            sampleTime += frameStep;


sampleTime += frameStep;

This may provide inaccurate values.
Use timestamp value from ReadSample. See code related to m_lastSampleTimestamp.

sampleTime += frameStep;

This may provide inaccurate values.
Use timestamp value from ReadSample. See code related to m_lastSampleTimestamp.

Timestamp value from Read Sample does not always indicate the end of the captured sample. Example, in the test case of {"mov", "H 264", 30. f, CAP_MSMF} (videoio_synthetic, write_read_position ) the time stamp in ReadSample is set at the beginning of the captured frame and for correct operation it is necessary to calculate SampleTime= + frameStep

the end of the captured sample
end

This is not needed at all and can NOT be properly determined for VFR videos (without reading of the next frame).

Use provided value from the decoder/demuxer.
Compare results with FFmpeg backend.

alalek · 2021-09-06T17:15:11Z

modules/videoio/src/cap_msmf.cpp

        case CV_CAP_PROP_POS_FRAMES:
-            return floor(((double)sampleTime / 1e7)* captureFormat.getFramerate() + 0.5);
+            return floor(((double)sampleTime / 1e7)* captureVideoFormat.getFramerate() + 0.5);


Use direct frame counter by counting of grab() / ReadSample() calls.

Update also ::setTime() method. Seek to zero should be accurate, others are not (perhaps we should not return property value after that)

alalek · 2021-09-27T21:08:21Z

modules/videoio/test/test_audio.cpp

+typedef std::tuple<std::string, double, int, int, int, int, int, double, std::pair<std::string, int> > paramCombination;
+typedef std::tuple<std::string, double, std::pair<std::string, int> > param;
+
+class baseAudio


baseAudio must start with capital letter

baseAudio => AudioBaseTest

alalek · 2021-09-27T21:09:45Z

modules/videoio/test/test_audio.cpp

+    void getValidAudioData()
+    {
+        const double step = 3.14/22050;
+        double value = 0;
+        for (int j = 0; j < 3; j++)
+        {
+            value = 0;
+            for (int i = 0; i < 44100; i++)
+            {
+                validAudioData.push_back(sin(value));
+                value += step;
+            }
+        }
+    }


code duplication

alalek · 2021-09-27T21:10:06Z

modules/videoio/test/test_audio.cpp

+    void comparisonAudio()
+    {
+        for (unsigned int i = 0; i < audioData.size(); i++)
+        {
+            EXPECT_LE(fabs(validAudioData[i] - audioData[i]), epsilon) << "sample index " << i;
+        }
+    }


code duplication

alalek · 2021-09-27T21:10:56Z

modules/videoio/test/test_audio.cpp

+            EXPECT_LE(fabs(validAudioData[i] - audioData[i]), epsilon) << "sample index " << i;
+        }
+    }
+    void comparisonVideo()


comparisonVideo => checkVideoFrames

alalek · 2021-09-27T21:11:43Z

modules/videoio/test/test_audio.cpp

+    void getValidAudioData()
+    {
+        const double step = 3.14/22050;
+        double value = 0;
+        for(int j = 0; j < 3; j++)
+        {
+            value = 0;
+            for(int i = 0; i < 44100; i++)
+            {
+                validAudioData.push_back(sin(value));
+                value += step;
+            }
+        }
+    }


code duplication

alalek · 2021-09-27T21:21:29Z

samples/cpp/videocapture_audio.cpp

+            int numberOfSamles = 0;
+            for (auto item : audioData)
+                numberOfSamles+=item.cols;
+            cout << "Number of samples: " << numberOfSamles << endl;


move out post processing code from the for loop.

Keep for loops readable - minimal as possible.

alalek · 2021-09-27T21:23:01Z

samples/cpp/videocapture_microphone.cpp

+    cap.open(0, CAP_MSMF, params);
+    if (!cap.isOpened())
+    {
+        cerr << "ERROR! Can't to open file" << endl;


file

there is no file

alalek · 2021-09-27T21:25:21Z

samples/cpp/videocapture_microphone.cpp

+    const double cvTickFreq = getTickFrequency();
+    int64 sysTimeCurr = getTickCount();
+    int64 sysTimePrev = sysTimeCurr;
+    while ((sysTimeCurr-sysTimePrev)/cvTickFreq < 10)


It makes sense to emit message that audio would be captured for the next 10 seconds.
To avoid confusion with program hang.

alalek · 2021-09-27T21:26:40Z

samples/cpp/videocapture_microphone.cpp

+        if (cap.grab())
+        {


"else" handling is missing.

What if user disconnect microphone device?

alalek · 2021-09-27T21:29:15Z

samples/cpp/videocapture_audio_combination.cpp

+            }
+            if (!audioFrame.empty())
+            {
+                audioData.push_back(audioFrame);


@fzhar Audio writer is out of scope of this PR. It would be added later.

alalek · 2021-09-30T23:35:33Z

modules/videoio/src/cap_msmf.cpp

+            try
+            {
+                if (chunkLengthOfBytes < INT_MIN || chunkLengthOfBytes > INT_MAX)
+                    throw "The chunkLengthOfBytes is out of the allowed range";
+            }
+            catch (const std::exception& e)
+            {
+                CV_LOG_WARNING(NULL, "MSMF: Exception is raised: " << e.what());
+                return false;
+            }


This can't work properly.

How I can fix that?

Look for similar code in the library. Use CV_Check.

alalek · 2021-10-01T00:19:44Z

modules/videoio/src/cap_msmf.cpp

+            }
+            copy(bufferAudioData.begin(), bufferAudioData.begin()+chunkLengthOfBytes, std::back_inserter(audioDataInUse));
+            bufferAudioData.erase(bufferAudioData.begin(), bufferAudioData.begin()+chunkLengthOfBytes);
+            residualTime = (double)(bufferAudioData.size()/((captureAudioFormat.bit_per_sample/8)*captureAudioFormat.nChannels))/captureAudioFormat.nSamplesPerSec;


residualTime

Wrong logic is here.
.retrive() must be idempotent implementation. It can be called many times with same parameters or not called at all.
It is wrong to modify residualTime variable.

State must be changed during .grab() call only.

alalek · 2021-10-01T00:22:12Z

modules/videoio/src/cap_msmf.cpp

+        bool returnFlag = true;
+        if (videoStream != -1)
+            returnFlag &= grabVideoFrame();
+        if (audioStream != -1)
+            returnFlag &= grabAudioFrame();


If we fail to capture video, then there is no reason to capture audio. We already failed.

alalek · 2021-10-01T00:33:34Z

modules/videoio/src/cap_msmf.cpp

                0,             // Flags.
                &streamIndex,  // Receives the actual stream index.
                &flags,        // Receives status flags.
-                &sampleTime,   // Receives the time stamp.
-                &videoSample   // Receives the sample or NULL.
+                NULL,   // Receives the time stamp.


Why audio ignores timestamp?

This is why the current logic fails on sample_960x400_ocean_with_audio.mp4

Add debug dump of values from ReadSample:

CvCapture_MSMF::initStream Init stream 1 with MediaType (960x400 @ 23.976) MFVideoFormat_RGB32 CvCapture_MSMF::initStream Init stream 0 with MediaType (0x0 @ 1) ☺ CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=1668332 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=213333 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=206349 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=419682 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 ... CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2085415 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=633061 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=846441 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213152 ... CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2502498 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1059593 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1272972 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379

Audio stream starts 4+ frames before video. We need to capture all of this audio data and return with the first video frame.

Why audio ignores timestamp?

This is why the current logic fails on sample_960x400_ocean_with_audio.mp4

Add debug dump of values from ReadSample:

CvCapture_MSMF::initStream Init stream 1 with MediaType (960x400 @ 23.976) MFVideoFormat_RGB32 CvCapture_MSMF::initStream Init stream 0 with MediaType (0x0 @ 1) ☺ CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=1668332 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=213333 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=206349 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=419682 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 ... CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2085415 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=633061 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=846441 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213152 ... CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2502498 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1059593 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1272972 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379

Audio stream starts 4+ frames before video. We need to capture all of this audio data and return with the first video frame.
Audio ignores timestamp because there was no need to use it. I will fix it.

alalek · 2021-10-01T00:41:37Z

modules/videoio/src/cap_msmf.cpp

+bool CvCapture_MSMF::setTime(int numberFrame)
+{
+    if(videoStream == -1)
+        return false;
+    PROPVARIANT var;


We should forbid seeking in case of Video+Audio capturing, except seek to beginning of the media file.

Other values are just not accurate even for Video-only cases.

The SetCurrentPosition method does not guarantee exact seeking. The accuracy of the seek depends on the media content.

https://docs.microsoft.com/en-us/windows/win32/api/mfreadwrite/nf-mfreadwrite-imfsourcereader-setcurrentposition

We should forbid seeking in case of Video+Audio capturing, except seek to beginning of the media file.

Other values are just not accurate even for Video-only cases.

The SetCurrentPosition method does not guarantee exact seeking. The accuracy of the seek depends on the media content.

https://docs.microsoft.com/en-us/windows/win32/api/mfreadwrite/nf-mfreadwrite-imfsourcereader-setcurrentposition

We can manually rewind to any frame by setting SetCurrentPosition to 0, call grab and release the frames until we reach the desired one

call grab and release the frames until we reach the desired one

Then the next step will be receiving a bug report about why seeking implementation is so slow.
If we want to implement feature, then we should it implemented in the best way. Otherwise we would get unresolvable threads like these: #9053

We should forbid seeking in case of Video+Audio capturing, except seek to beginning of the media file.

Then I can forbid setting CV_CAP_PROP_YPOS_FRAMES and CV_CAP_PROP_YPOS_MSC in case of Video+Audio capturing and allow setting CV_CAP_PROP_POS_AVI_RATIO for values 0 and 1

except seek to beginning of the media file.

Seek to zero should be supported for all modes (this case is well defined and just works).

alalek · 2021-10-01T00:55:14Z

modules/videoio/test/test_audio.cpp

+                if (!videoFrame.empty())
+                    videoData.push_back(videoFrame);


videoFrame.empty()

Must be always false on .grab() success.

Current last frame synchronization logic doesn't work properly.

alalek · 2021-10-01T00:57:46Z

modules/videoio/test/test_audio.cpp

+
+const paramCombination mediaParams[] =
+{
+    paramCombination("mp4", 1, 0.15, CV_8UC3, 240, 320, 90, 30, 30., {"CAP_MSMF", cv::CAP_MSMF})


"CAP_MSMF", cv::CAP_MSMF

Which other tests uses the same scheme? Why we need string here?

alalek · 2021-10-01T00:58:37Z

modules/videoio/test/test_audio.cpp

+                }
+                ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) -  (cap.get(CAP_PROP_POS_MSEC)/ 1000  + 1./fps) * samplePerSecond), audioSamplesTolerance);
+                ASSERT_LT(abs(audioFrame.cols - samplesPerFrame), audioSamplesTolerance);
+                ASSERT_EQ(CV_16SC1, audioFrame.type());


Type check is placed after data usage.

alalek · 2021-10-01T01:01:36Z

modules/videoio/test/test_audio.cpp

+                    }
+                }
+                ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) -  (cap.get(CAP_PROP_POS_MSEC)/ 1000  + 1./fps) * samplePerSecond), audioSamplesTolerance);
+                ASSERT_LT(abs(audioFrame.cols - samplesPerFrame), audioSamplesTolerance);


audioFrame.cols must be same across channels.

audioFrame.cols should not be checked for the first frame (or multiple no-audio frames in the beginning) and the last frame (due to the last frame synchronization, which reads audio till the end).

audioFrame.cols must be same across channels.

Do I need to check it?

yes, test should check that

alalek · 2021-10-01T01:04:31Z

modules/videoio/test/test_audio.cpp

+                        audioData[nCh].push_back(f);
+                    }
+                }
+                ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) -  (cap.get(CAP_PROP_POS_MSEC)/ 1000  + 1./fps) * samplePerSecond), audioSamplesTolerance);


To make this check to work we need one more read-only property which contains timestamp shift between audio and video streams (of the first captured data)

timestamp shift should be in nanoseconds.

MaximMilashchenko · 2021-10-01T17:13:52Z

How the pipeline should behave when the audio stream is shorter than the video stream? that is, the audio has an offset at the beginning and may end earlier

alalek · 2021-10-01T17:56:04Z

Last frames should return empty audio data. Audio data for the first frame is larger that for others (due to audio starts before video)

alalek

Please run test locally with

paramCombination("sample_960x400_ocean_with_audio.mp4", 2, 0.15, CV_8UC3, 400, 960, 1116, 30, 30., cv::CAP_MSMF)

Several test conditions still fail (besides of audio gold results).

modules/videoio/include/opencv2/videoio.hpp

modules/videoio/test/test_audio.cpp

alalek · 2021-10-15T03:03:27Z

modules/videoio/test/test_audio.cpp

+        for (unsigned int nCh = 0; nCh < audioData.size(); nCh++)
+            for (unsigned int i = 0; i < audioData[nCh].size(); i++)
+            {
+                EXPECT_LE(fabs(validAudioData[nCh][i] - audioData[nCh][i]), epsilon) << "sample index " << i;


Again, if readFile() fails or read more data then that code tried to perform out of buffer access.

nCh < audioData.size()

Checks constraints based on input data are logically incorrect. Gold values and test parameters must be used instead.

alalek · 2021-10-15T03:05:23Z

modules/videoio/test/test_audio.cpp

+                if (nCh == 0)
+                    audioFrameCols = audioFrame.cols;
+                else
+                    ASSERT_EQ(audioFrameCols, audioFrame.cols);


<< nCh; or use SCOPED_TRACE with nCh value.

modules/videoio/test/test_audio.cpp

alalek · 2021-10-15T03:17:37Z

modules/videoio/test/test_audio.cpp

+            if (!audioFrame.empty())
+                if (frame != numberOfFrames-1)
+                {
+                    ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) + (cap.get(CAP_PROP_TIME_SHIFT_STREAMS)/ 1e6) * samplePerSecond -  (cap.get(CAP_PROP_POS_MSEC)/ 1000  + 1./fps) * samplePerSecond), audioSamplesTolerance);


This condition is failed with:

paramCombination("sample_960x400_ocean_with_audio.mp4", 2, 0.15, CV_8UC3, 400, 960, 1116, 30, 30., cv::CAP_MSMF)

See audioSamplePos comment above.

1./fps

This is wrong. Both positions must point to be beginning of the data. You just can't estimate the end for VFR videos.
Check must look like this:

EXPECT_NEAR( cap.get(CAP_PROP_AUDIO_POS), ((cap.get(CAP_PROP_POS_MSEC) - cap.get(CAP_PROP_TIME_SHIFT_STREAMS) / 1e6) / 1000) * samplePerSecond, audioSamplesTolerance) << cap.get(CAP_PROP_POS_MSEC)=" << cap.get(CAP_PROP_POS_MSEC);

Check must look like this:

EXPECT_NEAR( cap.get(CAP_PROP_AUDIO_POS), ((cap.get(CAP_PROP_POS_MSEC) - cap.get(CAP_PROP_TIME_SHIFT_STREAMS) / 1e6) / 1000) * samplePerSecond, audioSamplesTolerance) << cap.get(CAP_PROP_POS_MSEC)=" << cap.get(CAP_PROP_POS_MSEC);

This is how the check should look, only for the first frame. After reading the first frame, the timestamps of audio and video will not need to align and take into account the offset

if (frame == 0)
{
EXPECT_NEAR(
cap.get(CAP_PROP_AUDIO_POS),
((cap.get(CAP_PROP_POS_MSEC) + cap.get(CAP_PROP_AUDIO_SHIFT_NSEC) / 1e6) / 1000) * samplePerSecond,
audioSamplesTolerance)
<< "CAP_PROP_POS_MSEC = " << cap.get(CAP_PROP_POS_MSEC);
}
else
{
EXPECT_NEAR(cap.get(CAP_PROP_AUDIO_POS), (cap.get(CAP_PROP_POS_MSEC) / 1000) * samplePerSecond, audioSamplesTolerance) << "CAP_PROP_POS_MSEC = " << cap.get(CAP_PROP_POS_MSEC);
EXPECT_LT(abs(audioFrame.cols - samplesPerFrame), audioSamplesTolerance);
}

No, CAP_PROP_AUDIO_SHIFT_NSEC must be applied for all frames.

frame == 0 check is EXPECT_EQ(0, cap.get(CAP_PROP_AUDIO_POS));

No, CAP_PROP_AUDIO_SHIFT_NSEC must be applied for all frames.

Calculations will be erroneous. From CAP_PROP_POS_MSEC, you need to subtract the audio offset relative to the zero of the media file, and not the time difference between the start of the audio stream and the video stream

Check is updated. Please fetch the last commit.

modules/videoio/src/cap_msmf.cpp

modules/videoio/test/test_audio.cpp

alalek

Thank you for update!

alalek · 2021-10-18T18:01:25Z

modules/videoio/src/cap_msmf.cpp

+                        nFrame++;
+                        usedVideoSample->GetSampleDuration(&videoSampleDuration);
+                        requiredAudioTime = usedVideoSampleTime + videoSampleDuration - allTime;
+                        allTime += requiredAudioTime;


allTime

unclear name which is used for audio only.

alalek · 2021-10-18T18:02:16Z

modules/videoio/src/cap_msmf.cpp

+    if (residualTime*1e7 > requiredAudioTime)
+        return true;
+    while ((!vEOS) ? bufferedAudioDuration <= requiredAudioTime : !aEOS)


residualTime

This is some kind of duplicate for bufferedAudioDuration (but with unclear name)

alalek · 2021-10-18T18:06:40Z

modules/videoio/src/cap_msmf.cpp

+    if (returnFlag)
+        if (!audioSamples.empty() || !bufferAudioData.empty() && aEOS)
+        {


No need to indent such huge code blocks:

if (!returnFlag) { CV_LOG_DEBUG(...); return false; } ... code block ...

alalek · 2021-10-18T18:09:52Z

modules/videoio/src/cap_msmf.cpp

+                if (!audioSamples.back())
+                    audioSamples.pop_back();


Just do not write into array directly through &audioSamples[numberOfSamples].

It is better to write into local variable and register it on success after all passed checks.

alalek · 2021-10-18T18:12:06Z

modules/videoio/src/cap_msmf.cpp

-                sampleTime += frameStep;
+                audioSamples[numberOfSamples]->GetSampleDuration(&audioSampleDuration);
+                CV_LOG_DEBUG(NULL, "videoio(MSMF): got audio frame with timestamp=" << audioSampleTime << "  duration=" << audioSampleDuration);
+                bufferedAudioDuration += (LONGLONG)(audioSampleDuration + residualTime*1e7);


residualTime*1e7

Added multiple times in the loop (unexpected shift):

while ((!vEOS) ? bufferedAudioDuration <= requiredAudioTime : !aEOS)

alalek · 2021-10-18T18:14:34Z

modules/videoio/src/cap_msmf.cpp

+                if (videoStream == -1)
+                    break;


"while()" condition should handle this?

alalek · 2021-10-18T18:16:57Z

modules/videoio/test/test_audio.cpp

+    void checkAudio()
+    {
+        for (unsigned int nCh = 0; nCh < audioData.size(); nCh++)
+            for (unsigned int i = 0; i < validAudioData[nCh].size(); i++)


audioData[nCh].size() check is missing

No need to have 2 identical checkAudio() implementations.

alalek · 2021-10-18T18:19:05Z

modules/videoio/test/test_audio.cpp

+#if 0
+    // https://filesamples.com/samples/video/mp4/sample_960x400_ocean_with_audio.mp4
+    , paramCombination("sample_960x400_ocean_with_audio.mp4", 2, 0.15, CV_8UC3, 400, 960, 1116, 0, 30, 30., cv::CAP_MSMF)
+#endif


This test case has audio-video desync starting from frame 666 which increases.

This test case has audio-video desync starting from frame 666 which increases.

Starting from frame 666, the audio position begins to outpace the video by more than (1.0 / fps) * 0.3, because for each output of audio data, it is necessary that the data size in bytes be a multiple of the dimension of the matrix.

chunkLengthOfBytes +=
((int)(captureAudioFormat.bit_per_sample)/8* (int)captureAudioFormat.nChannels) - chunkLengthOfBytes %
((int)(captureAudioFormat.bit_per_sample)/8* (int)captureAudioFormat.nChannels);

For this reason, with each iteration, the gap between the video and audio positions will increase and on sufficiently long media files, desynchronization of positions will be observed, but the amount of output audio data with a frame will be unchanged. Confirmed by verification:
EXPECT_NEAR(audio Frame.cols, samplesPerFrame, audioSamplesTolerance);

It is not necessary to return chunks of audio of the same size.

but the amount of output audio data with a frame will be unchanged

This doesn't make any sense for videos with variable frame rate.

It is not necessary to return chunks of audio of the same size.

but the amount of output audio data with a frame will be unchanged

This doesn't make any sense for videos with variable frame rate.

Chunks of audio of the same size are returned only if the frame rate is constant. Chunks of audio are calculated based on the duration of the frame

Chunks of audio are calculated based on the duration of the frame

Chunks of audio should be calculated based on the timestamp of the next video frame.

alalek

Well done 👍

add audio support in cap_msmf * audio msmf * fixed warnings * minor fix * fixed SampleTime MSMF * minor fix, fixed audio test, retrieveAudioFrame * fixed warnings * impelemented sync audio and video stream with start offset * fixed error * fixed docs * fixed audio sample * CAP_PROP_AUDIO_POS, minor fixed * fixed warnings * videoio(MSMF): update audio test checks, add debug logging * fixed * fixed desynchronization of time positions, warnings * fixed warnings * videoio(audio): tune tests checks * videoio(audio): update properties description * build warnings Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>

MaximMilashchenko changed the title ~~audio~~ add audio support in cap_msmf Mar 13, 2021

alalek reviewed Mar 23, 2021

View reviewed changes

allnes linked an issue Mar 25, 2021 that may be closed by this pull request

Support audio in OpenCV #16394

Open

alalek reviewed Apr 29, 2021

View reviewed changes

alalek reviewed May 1, 2021

View reviewed changes

alalek reviewed Jun 3, 2021

View reviewed changes

MaximMilashchenko force-pushed the Audio branch from 4c1cb09 to 223041b Compare June 21, 2021 14:50

MaximMilashchenko force-pushed the Audio branch 2 times, most recently from 86f5c78 to a7e9d4d Compare August 26, 2021 15:25

alalek reviewed Sep 1, 2021

View reviewed changes

fzhar reviewed Sep 1, 2021

View reviewed changes

fzhar suggested changes Sep 1, 2021

View reviewed changes

alalek reviewed Sep 6, 2021

View reviewed changes

audio msmf

31d3b96

MaximMilashchenko force-pushed the Audio branch from d50197d to 31d3b96 Compare September 11, 2021 11:25

Milashchenko added 3 commits September 15, 2021 06:01

fixed warnings

11345d9

minor fix

fc83b4f

fixed SampleTime MSMF

ac62aee

alalek reviewed Sep 27, 2021

View reviewed changes

Milashchenko added 2 commits September 29, 2021 21:29

minor fix, fixed audio test, retrieveAudioFrame

a08a612

fixed warnings

6afde7c

alalek reviewed Oct 1, 2021

View reviewed changes

spazewalker mentioned this pull request Oct 3, 2021

speech recognition sample #20291

Merged

8 tasks

MaximMilashchenko added 4 commits October 12, 2021 13:56

impelemented sync audio and video stream with start offset

5c5f607

fixed error

91307c3

fixed docs

47e1ed4

fixed audio sample

bfd6793

alalek reviewed Oct 15, 2021

View reviewed changes

MaximMilashchenko added 2 commits October 17, 2021 22:17

CAP_PROP_AUDIO_POS, minor fixed

ce4618a

fixed warnings

62bb2de

alalek reviewed Oct 18, 2021

View reviewed changes

modules/videoio/test/test_audio.cpp Outdated Show resolved Hide resolved

videoio(MSMF): update audio test checks, add debug logging

43342be

alalek reviewed Oct 18, 2021

View reviewed changes

MaximMilashchenko and others added 5 commits October 19, 2021 19:41

fixed

36e9258

fixed desynchronization of time positions, warnings

1d9f6e8

fixed warnings

c8859e2

videoio(audio): tune tests checks

75775cb

videoio(audio): update properties description

fad6a28

alalek approved these changes Oct 20, 2021

View reviewed changes

build warnings

0642119

alalek merged commit f36c268 into opencv:master Oct 20, 2021

alalek mentioned this pull request Dec 30, 2021

(5.x) Merge 4.x #21371

Merged

MaximMilashchenko deleted the Audio branch January 11, 2022 14:27

alalek mentioned this pull request Feb 22, 2022

(5.x) Merge 4.x #21651

Merged

		@@ -315,7 +315,6 @@ enum
		CV_CAP_PROP_XI_SENSOR_FEATURE_SELECTOR = 585, // Selects the current feature which is accessible by XI_PRM_SENSOR_FEATURE_VALUE.
		CV_CAP_PROP_XI_SENSOR_FEATURE_VALUE = 586, // Allows access to sensor feature value currently selected by XI_PRM_SENSOR_FEATURE_SELECTOR.

		cout << "Number of samples: " << cap.get(cv::CAP_PROP_AUDIO_POS) << endl;
		return 0;

		ASSERT_TRUE(cap.retrieve(videoFrame));
		ASSERT_TRUE(cap.retrieve(audioFrame, audioBaseIndex));

Uh oh!

add audio support in cap_msmf #19721

add audio support in cap_msmf #19721

Uh oh!

Conversation

MaximMilashchenko commented Mar 13, 2021 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaximMilashchenko commented Mar 19, 2021

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalek Mar 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asenyaev commented Apr 8, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Samson-Mayeem commented May 2, 2021

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

MaximMilashchenko commented Mar 13, 2021 •

edited by alalek

Loading

alalek Mar 23, 2021 •

edited

Loading

ebachard commented Jul 14, 2021 •

edited

Loading