CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
add audio support in cap_msmf #19721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@alalek, I fixed the errors, but there is a warning because the patch size is exceeded. Also, the build failed on one of the Jenkins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patch size opencv_extra: 114698 KiB
Where is the link on opencv_extra's PR?
114 Mb is very large. Existed video test data is even smaller. Test data must be reduced.
@allnes do we have an issue about Audio support? Link should be added.
PR's description should not be empty. At least it should define the scope of work, how/what should be validated, prerequisites, etc.
@@ -315,6 +315,9 @@ enum | |||
CV_CAP_PROP_XI_SENSOR_FEATURE_SELECTOR = 585, // Selects the current feature which is accessible by XI_PRM_SENSOR_FEATURE_VALUE. | |||
CV_CAP_PROP_XI_SENSOR_FEATURE_VALUE = 586, // Allows access to sensor feature value currently selected by XI_PRM_SENSOR_FEATURE_SELECTOR. | |||
|
|||
//Properties of audio VideoIO | |||
CV_CAP_PROP_AUDIO_ENABLE = 1000, // Select audio or video | |||
CV_CAP_PROP_BPS = 1001, // Change bit_per_sample parametr for audio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
legacy/constants_c.h
Legacy files must not be touched without critical reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this file to use the same style of property names inside the backend (CV_CAP_PROP_...). If I don't change the file then I need to use CAP_PROP_...
@@ -185,6 +185,8 @@ enum VideoCaptureProperties { | |||
CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499) | |||
CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific. | |||
CAP_PROP_HW_DEVICE =51, //!< (**open-only**) Hardware device index (select GPU if multiple available) | |||
CAP_PROP_AUDIO_ENABLE =1000, | |||
CAP_PROP_BPS =1001, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CAP_PROP_BPS
What does that mean in the context of VideoCapture
?
"Bytes per second"? (the most incorrect case)
@@ -185,6 +185,8 @@ enum VideoCaptureProperties { | |||
CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499) | |||
CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific. | |||
CAP_PROP_HW_DEVICE =51, //!< (**open-only**) Hardware device index (select GPU if multiple available) | |||
CAP_PROP_AUDIO_ENABLE =1000, | |||
CAP_PROP_BPS =1001, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proper documentation must be added.
@@ -185,6 +185,8 @@ enum VideoCaptureProperties { | |||
CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499) | |||
CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific. | |||
CAP_PROP_HW_DEVICE =51, //!< (**open-only**) Hardware device index (select GPU if multiple available) | |||
CAP_PROP_AUDIO_ENABLE =1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=1000
Why? Why is not 10000000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted the value in this file to match the values in the constants_c.h file.
modules/videoio/test/test_audio.cpp
Outdated
@@ -0,0 +1,150 @@ | |||
#include <tuple> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mandatory requirements of OpenCV library:
- proper license header
- test_precomp.hpp must be included first.
@@ -185,6 +185,8 @@ enum VideoCaptureProperties { | |||
CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499) | |||
CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific. | |||
CAP_PROP_HW_DEVICE =51, //!< (**open-only**) Hardware device index (select GPU if multiple available) | |||
CAP_PROP_AUDIO_ENABLE =1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Select audio or video
We should support processing of both video and audio streams. This is why we are integrating into VideoCapture
instead of adding dedicated Audio API.
The next requirement is that video and audio must be synchronized in a some way (need to define these details and write them somewhere. Other "backends" and user apps should follow there guidelines).
Also we need these properties:
- (Priority 0) CAP_PROP_VIDEO_STREAM - (open-only, 0-based index or -1 to disable) - Default value is 0.
- (Priority 0) CAP_PROP_AUDIO_STREAM - (open-only, 0-based index or -1 to disable) - Specify stream in multi-language media files. -1 - disable audio processing (default)
- (Priority 1) CAP_PROP_AUDIO_POS - (read-only, in samples) accurate audio sample timestamp of previous grabbed fragment. See CAP_PROP_AUDIO_SAMPLES_PER_SECOND.
- (Priority 1) CAP_PROP_AUDIO_DATA_DEPTH - (open-only, Mat::depth()) Default value is -1: use "native" file/codec information. Alternative definition to bits-per-sample, but with clear handling of 32F / 32S.
- (Priority 1) CAP_PROP_AUDIO_SAMPLES_PER_SECOND - (open-only) 0 - determine from file/codec input. If not specified through parameters or input, then selected audio sample rate is 44100. Note: resampling is not performed by OpenCV.
(Priority 2) CAP_PROP_AUDIO_CHANNELS - (open-only?, bitset) - to properly handle stereo or 5.1 streams. ? 0 - use average of all channels. Use -1 to grab all channels. Default value is 0 / -1 ?- (Priority 3) CAP_PROP_AUDIO_TOTAL_SAMPLES - (read-only) Number of samples in the file. Returns zero if information is not available (live streams).
- (Priority 3) CAP_PROP_AUDIO_TOTAL_CHANNELS - (read-only) Number of audio channels in the selected audio stream.
- (Priority 4) CAP_PROP_AUDIO_TOTAL_STREAMS - (read-only) Number of audio stream in the used media.
- 44100 has problem with 24 FPS video (1827.5 audio samples per frame). 48000 is good for 24,25,30 FPS media streams, but it has less support from codecs. See https://en.wikipedia.org/wiki/Sampling_(signal_processing)
Update 2021-04-30:
- drop
CAP_PROP_AUDIO_CHANNELS
due to confusion withCAP_PROP_AUDIO_TOTAL_CHANNELS
. - will be replaced by
CAP_PROP_AUDIO_CHANNELS_RETRIEVE_MODE
later (not in the scope of this PR)
TBD later
int apiID = cv::CAP_MSMF; | ||
//congigurate VideoCapture for video and audio | ||
VideoCapture cap_video; | ||
VideoCapture cap_audio; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong way.
As said above, we are integrating into VideoCapture to handle both streams instead of adding a dedicated Audio API.
return -1; | ||
} | ||
// open selected micro using selected API | ||
std::vector<int> params { CAP_PROP_AUDIO_ENABLE , static_cast<int>(1) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_cast(1)
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guided on creating a vector of parameters in a video writer (cap.cpp). There the bool cast to int, in my case it is not required. I will correct this.
samples/cpp/videocapture_audio.cpp
Outdated
// open selected micro using selected API | ||
cap.open(0, apiID, params); | ||
if (!cap.isOpened()) { | ||
cerr << "ERROR! Can't to open file\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\n
There is std::endl
in C++.
jenkins cn please retry a build |
enum DeviceSatus | ||
{ | ||
NONDEVICE = -1, | ||
CAMERA = 0, | ||
MICROPHONE = 1 | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why users need that in public API?
@@ -315,7 +315,6 @@ enum | |||
CV_CAP_PROP_XI_SENSOR_FEATURE_SELECTOR = 585, // Selects the current feature which is accessible by XI_PRM_SENSOR_FEATURE_VALUE. | |||
CV_CAP_PROP_XI_SENSOR_FEATURE_VALUE = 586, // Allows access to sensor feature value currently selected by XI_PRM_SENSOR_FEATURE_SELECTOR. | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert all changes in this file, including touching of empty lines.
modules/videoio/src/cap_msmf.cpp
Outdated
res.majorType = MFMediaType_Audio; | ||
res.subType = MFAudioFormat_PCM; | ||
res.bit_per_sample = 32; | ||
res.nChannels = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should start from handling of 'Mono' audio streams
modules/videoio/src/cap_msmf.cpp
Outdated
const double thisRateDiff = absDiff(getFramerate(), ref.getFramerate()); | ||
const double otherRateDiff = absDiff(other.getFramerate(), ref.getFramerate()); | ||
if (thisRateDiff < otherRateDiff) | ||
if (width > other.width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename original function to handle "video"only data and create new one for audio and "generic".
Keep patches small, code history clear.
modules/videoio/src/cap_msmf.cpp
Outdated
if (thisDiff < otherDiff) | ||
return true; | ||
if (thisDiff == otherDiff) | ||
if(majorType == MFMediaType_Video) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code format: if<space>()
if
is not a function, it is C++ statement.
The same note is about for()
const int audio_base_index = cap.get(cv::CAP_PROP_AUDIO_BASE_INDEX); | ||
for (;;) | ||
{ | ||
cap.read(video_frame); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't lost returned results.
Show how to handle them.
@@ -185,11 +185,28 @@ enum VideoCaptureProperties { | |||
CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499) | |||
CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific. | |||
CAP_PROP_HW_DEVICE =51, //!< (**open-only**) Hardware device index (select GPU if multiple available) | |||
CAP_PROP_ENABLE_MICROPHONE = 52, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CAP_PROP_ENABLE_MICROPHONE
Why do we need dedicated property for that?
CAP_PROP_AUDIO_SAMPLES_PER_SECOND = 57, | ||
CAP_PROP_AUDIO_CHANNELS = 58, | ||
CAP_PROP_AUDIO_BASE_INDEX = 59, | ||
CAP_PROP_AUDIO_TOTAL_CHANNELS = 60, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CAP_PROP_AUDIO_CHANNELS
CAP_PROP_AUDIO_TOTAL_CHANNELS
Documentation should be added.
modules/videoio/src/cap_msmf.cpp
Outdated
@@ -951,15 +1139,14 @@ bool CvCapture_MSMF::open(const cv::String& _filename, const cv::VideoCapturePar | |||
} | |||
else | |||
duration = 0; | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read OpenCV contribution guidelines on Wiki and install necessary Git hooks.
modules/videoio/src/cap_msmf.cpp
Outdated
MediaType captureFormat; | ||
MediaType captureVideoFormat; | ||
MediaType captureAudioFormat; | ||
bool deviceType; // false - camera, true - audio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool deviceType; // false - camera, true - audio
... and somewhere we have recorded streams too.
This looks overcomplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comment with properties proposals too.
modules/videoio/src/cap_msmf.cpp
Outdated
} | ||
else | ||
{ | ||
cv::Mat(1, cursize, CV_8UC1, ptr, pitch).copyTo(frame); | ||
//switch(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is cv::extractChannel()
call in OpenCV which can help to extract channel from interleaved data.
Can i get a common structure of the OpenCv development here, so i can know the idea forward, relative to this. i have been busy and could not follow a lot of trends on opencv. can one just make a simple summary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All CI builds are failed due to merge conflict.
Squash commits and then rebase on upstream branch (in that order).
samples/cpp/videocapture_audio.cpp
Outdated
#include <fstream> | ||
#include <stdio.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fstream
stdio.h
Used?
samples/cpp/videocapture_audio.cpp
Outdated
} | ||
|
||
const int audioBaseIndex = cap.get(cv::CAP_PROP_AUDIO_BASE_INDEX); | ||
const int numberOfChannels = cap.get(cv::CAP_PROP_AUDIO_TOTAL_CHANNELS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cv::
we have using namespace cv
in all samples to avoid using of cv::
prefix
samples/cpp/videocapture_audio.cpp
Outdated
cout << "Number of samples: " << cap.get(cv::CAP_PROP_AUDIO_POS) << endl; | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How we can get here?
This is the "error" path.
.grab()
should handle normal end-of-stream handling.
if (audioFrame.empty() && videoFrame.empty()) | ||
{ | ||
cerr << "ERROR! blank frame grabbed" << endl; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why .grab()
didn't handle that?
cap.grab(); | ||
cap.retrieve(frame, audio_base_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing check of the returned results.
modules/videoio/test/test_audio.cpp
Outdated
VideoCapture cap; | ||
}; | ||
|
||
class aud : public AudioTestFixture{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aud
What is the problem to use normal name?
modules/videoio/test/test_audio.cpp
Outdated
void comparison() | ||
{ | ||
for(int i = 0; i < validData.size(); i++) | ||
ASSERT_TRUE(fabs(validData[i] - fileData[i]) < epsilon); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google test provides ASSERT_NEAR
with accurate error reporting.
Also use (...) << i;
to dump index at least.
modules/videoio/test/test_audio.cpp
Outdated
void comparison() | ||
{ | ||
for(int i = 0; i < validData.size(); i++) | ||
ASSERT_TRUE(fabs(validData[i] - fileData[i]) < epsilon); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fileData[i]
missing check for buffer overflow.
modules/videoio/test/test_audio.cpp
Outdated
for(int j = 0; j < 44100*3; j += 44100) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is not from 0 to 3?
Hello, @MaximMilashchenko : Is it possible to attach a full diff (with master) ? I'd like to test the whole changes. Thanks in advance, |
@ebachard Check links at page footer: "ProTip! Add .patch or .diff to the end of URLs for Git’s plaintext views." |
@alalek : thanks for the tip. In fact, in meantime I created a full diff including new created files, and I'll give it a try. First, I'll do a (personal) code review, to understand how things work, and then I'll experiment. Thanks again ! |
86f5c78
to
a7e9d4d
Compare
CAP_PROP_AUDIO_BASE_INDEX = 60, //!< Number of video channels | ||
CAP_PROP_AUDIO_TOTAL_CHANNELS = 61, //!< Number of audio channels in the selected audio stream. | ||
CAP_PROP_AUDIO_TOTAL_STREAMS = 62, //!< Number of audio stream in the used media. | ||
CAP_PROP_SYNC_LAST_FRAME = 63, //!< Defult value is 1 (the last audio frame is synchronized with the video frame by duration), 0 is no audio and video last frames sync(the last audio frame will contain all remaining audio data. The duration of the received audio data may be longer than the duration of the received video data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to keep AUDIO prefix:
CAP_PROP_SYNC_LAST_FRAME => CAP_PROP_AUDIO_SYNC_LAST_FRAME
CAP_PROP_AUDIO_POS = 57, //!< Audio position is measured in samples. Accurate audio sample timestamp of previous grabbed fragment. See CAP_PROP_AUDIO_SAMPLES_PER_SECOND | ||
CAP_PROP_AUDIO_DATA_DEPTH = 58, //!< Alternative definition to bits-per-sample, but with clear handling of 32F / 32S | ||
CAP_PROP_AUDIO_SAMPLES_PER_SECOND = 59, //!< determined from file/codec input. If not specified, then selected audio sample rate is 44100 | ||
CAP_PROP_AUDIO_BASE_INDEX = 60, //!< Number of video channels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of video channels
(read-only) Index of first audio channel. Used as parameter for .retrieve()
call.
Perhaps we should have: CAP_PROP_VIDEO_TOTAL_CHANNELS
(currently with the same value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it's not clear why Audio_base_index sets number of video channels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do both. (add total and rename to VIDEO_TOTAL_CHANNELS). Also, you need to define clearly in docs that audio channel number is not zero-indexed, it continues enumeration after video channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of video channels
(read-only) Index of first audio channel. Used as parameter for
.retrieve()
call.Perhaps we should have:
CAP_PROP_VIDEO_TOTAL_CHANNELS
(currently with the same value)
What do I have to do here: to rename CAP_PROP_AUDIO_BASE_INDEX=>CAP_PROP_VIDEO_TOTAL_CHANNELS or to add a new CAP_PROP_VIDEO_TOTAL_CHANNELS property and to change the CAP_PROP_AUDIO_BASE_INDEX description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CAP_PROP_AUDIO_BASE_INDEX
should be here with updated comment (see above).- add new property
CAP_PROP_VIDEO_TOTAL_CHANNELS
(for RGBD or stereo streams/cameras)
modules/videoio/src/cap_msmf.cpp
Outdated
@@ -69,7 +69,7 @@ static void init_MFCreateDXGIDeviceManager() | |||
#endif | |||
|
|||
#include <mferror.h> | |||
|
|||
#include <fstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use this?
modules/videoio/src/cap_msmf.cpp
Outdated
virtual bool grabFrame() CV_OVERRIDE; | ||
bool retrieveAudioFrame(int, cv::OutputArray); | ||
bool retrieveVideoFrame(cv::OutputArray); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cv::
Should not be used in OpenCV code.
Exception is cv::format()
(to avoid problem with std::format()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cv::
Should not be used in OpenCV code.
Exception is
cv::format()
(to avoid problem withstd::format()
)
In master, cv:: is used with this argument. Is it necessary to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove from modified/added code.
No need to change unmodified parts.
modules/videoio/src/cap_msmf.cpp
Outdated
buf = NULL; | ||
} | ||
audioSamples.clear(); | ||
int numberOfByte = (videoStream != -1) ? (int)((double)(curVideoTime/1e7)*captureAudioFormat.nSamplesPerSec*captureAudioFormat.nChannels*(captureAudioFormat.bit_per_sample)/8) : cursize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curVideoTime/1e7
It is not accurate with integer numbers:
LONGLONG curVideoTime;
Division should be the last operation of expression (also avoid overflow - int64_t
should be enough).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numberOfByte is confusing. What is it - start of data chunk or chunk's length ? Please rename (using start or length).
What is magic number 1e7 - does it come from MSFT measurements in units of 0.1us ?
(double) cast is extra here, 1e7 should be double and will autocast everything else to double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curVideoTime/1e7
It is not accurate with integer numbers:
LONGLONG curVideoTime;
Division should be the last operation of expression (also avoid overflow -
int64_t
should be enough).
Mat accepts arguments with the int type. If we take int64_t numberOfByte, then the conversion of int64_t types to int will occur in the Mat constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- compute as int64_t
- avoid doubles / floating-point computations
- check range of final result and cast to "int" if needed
ASSERT_TRUE(cap.retrieve(videoFrame)); | ||
ASSERT_TRUE(cap.retrieve(audioFrame, audioBaseIndex)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide useful messages about failed checks. Add number of processed frame.
There are different cases with failures on frame 0 (nothing works here), and some frame 345 (something works).
Use GoogleTest SCOPED_TRACE
for that.
{ | ||
ASSERT_TRUE(cap.retrieve(videoFrame)); | ||
ASSERT_TRUE(cap.retrieve(audioFrame, audioBaseIndex)); | ||
ASSERT_EQ(audioFrame.cols/44100, 1/fps); // check if the duration of the received audio data satisfies one video frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this check may fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If the duration of the received audio data does not equal to one frame of the video, the check will not be passed. This is possible if synchronization does not work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided code doesn't look valid. Looks like it always compare 0 vs 0.
Use this instead:
int samplesPerFrame = 1/fps*44100;
// 44100 should be checked and replaced to CAP_PROP_AUDIO_SAMPLES_PER_SECOND
property.
int audioSamplesTolerance = samplesPerFrame / 2;
We should have these checks:
- position: abs(CAP_PROP_AUDIO_POS - CAP_PROP_POS_MSEC / 1000 * 44100) < audioSamplesTolerance
- duration: abs(audioFrame.cols - samplesPerFrame) < audioSamplesTolerance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided code doesn't look valid. Looks like it always compare 0 vs 0.
Use this instead:
int samplesPerFrame = 1/fps*44100;
// 44100 should be checked and replaced toCAP_PROP_AUDIO_SAMPLES_PER_SECOND
property.int audioSamplesTolerance = samplesPerFrame / 2;
We should have these checks:
- position: abs(CAP_PROP_AUDIO_POS - CAP_PROP_POS_MSEC / 1000 * 44100) < audioSamplesTolerance
- duration: abs(audioFrame.cols - samplesPerFrame) < audioSamplesTolerance
The CAP_PROP_POS_MSEC value is not always valid
ASSERT_EQ(audioFrame.cols/44100, 1/fps); // check if the duration of the received audio data satisfies one video frame | ||
if (!videoFrame.empty()) | ||
videoData.push_back(videoFrame); | ||
for (int i = 0; i < audioFrame.cols; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check for audioFrame.type()
as code below assumes CV_16SC1
.
modules/videoio/test/test_audio.cpp
Outdated
} | ||
ASSERT_FALSE(fileData.empty()); | ||
} | ||
void comparison() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkAudio
validateAudio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkAudio
validateAudio
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion to use more clear name
@@ -246,7 +299,7 @@ struct MediaType | |||
return wdiff + hdiff; | |||
} | |||
// check if 'this' is better than 'other' comparing to reference | |||
bool isBetterThan(const MediaType& other, const MediaType& ref) const | |||
bool VideoIsBetterThan(const MediaType& other, const MediaType& ref) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually video quality is compared not only by resolution, but also by bitrate and FRC mode
modules/videoio/src/cap_msmf.cpp
Outdated
buf = NULL; | ||
} | ||
audioSamples.clear(); | ||
int numberOfByte = (videoStream != -1) ? (int)((double)(curVideoTime/1e7)*captureAudioFormat.nSamplesPerSec*captureAudioFormat.nChannels*(captureAudioFormat.bit_per_sample)/8) : cursize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numberOfByte is confusing. What is it - start of data chunk or chunk's length ? Please rename (using start or length).
What is magic number 1e7 - does it come from MSFT measurements in units of 0.1us ?
(double) cast is extra here, 1e7 should be double and will autocast everything else to double.
modules/videoio/src/cap_msmf.cpp
Outdated
} | ||
} | ||
_ComPtr<IMFMediaBuffer> buf = NULL; | ||
BYTE* useAudioData = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
audioDataBuffer or audioDataInUse ? useAudioData sounds like function name or boolean flag
modules/videoio/src/cap_msmf.cpp
Outdated
for (int i = 0; i < numberOfByte; i++) | ||
{ | ||
useAudioData[i] = bufferAudioData[i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we use memcpy_s or std::copy (in case of using std::dequeue instead of vector for bufferAudioData)?
modules/videoio/src/cap_msmf.cpp
Outdated
LONGLONG audioSamplePos; | ||
DWORD numberOfAudioStreams; | ||
Mat audioFrame; | ||
std::vector<BYTE> bufferAudioData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as this buffer will be used as queue for samples, I'd suggest to use ring queue or std::deque to speed it up. This will be noticeable on long queues only though.
modules/videoio/src/cap_msmf.cpp
Outdated
|
||
if (!SUCCEEDED(buf->Lock(&ptr, &maxsize, &cursize))) | ||
break; | ||
for (unsigned int i = 0; i < cursize; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do .reserve first to speed it up. Or do resize and just fill the data to the tail.
} | ||
if (!audioFrame.empty()) | ||
{ | ||
audioData.push_back(audioFrame); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, it would be good to do something with decoded data. As long as we have live video,you may paint oscillogram of decoded chunk over the video frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fzhar Audio writer is out of scope of this PR. It would be added later.
for (int nCh = 0; nCh < numberOfChannels; nCh++) | ||
{ | ||
cap.retrieve(frame, audioBaseIndex+nCh); | ||
audioData.push_back(frame); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. Writing audio to the file would be nice.
modules/videoio/src/cap_msmf.cpp
Outdated
{ | ||
DWORD streamIndex, flags; | ||
HRESULT hr; | ||
std::vector<bool> outInstalFlag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need vector of flags here ? At the end they are just and-ed, mybe better use single boolean flag instead of vector ?
modules/videoio/src/cap_msmf.cpp
Outdated
} | ||
else if (isOpen) | ||
{ | ||
std::vector<bool> outInstalFlag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector is really extra here. jut use single bool
modules/videoio/src/cap_msmf.cpp
Outdated
} | ||
else | ||
{ | ||
sampleTime += frameStep; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sampleTime += frameStep;
This may provide inaccurate values.
Use timestamp value from ReadSample. See code related to m_lastSampleTimestamp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sampleTime += frameStep;
This may provide inaccurate values.
Use timestamp value from ReadSample. See code related tom_lastSampleTimestamp
.
Timestamp value from Read Sample does not always indicate the end of the captured sample. Example, in the test case of {"mov", "H 264", 30. f, CAP_MSMF} (videoio_synthetic, write_read_position ) the time stamp in ReadSample is set at the beginning of the captured frame and for correct operation it is necessary to calculate SampleTime= + frameStep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the end of the captured sample
end
This is not needed at all and can NOT be properly determined for VFR videos (without reading of the next frame).
Use provided value from the decoder/demuxer.
Compare results with FFmpeg backend.
modules/videoio/src/cap_msmf.cpp
Outdated
case CV_CAP_PROP_POS_FRAMES: | ||
return floor(((double)sampleTime / 1e7)* captureFormat.getFramerate() + 0.5); | ||
return floor(((double)sampleTime / 1e7)* captureVideoFormat.getFramerate() + 0.5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use direct frame counter by counting of grab() / ReadSample() calls.
Update also ::setTime()
method. Seek to zero should be accurate, others are not (perhaps we should not return property value after that)
d50197d
to
31d3b96
Compare
modules/videoio/test/test_audio.cpp
Outdated
typedef std::tuple<std::string, double, int, int, int, int, int, double, std::pair<std::string, int> > paramCombination; | ||
typedef std::tuple<std::string, double, std::pair<std::string, int> > param; | ||
|
||
class baseAudio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
baseAudio must start with capital letter
baseAudio => AudioBaseTest
modules/videoio/test/test_audio.cpp
Outdated
void getValidAudioData() | ||
{ | ||
const double step = 3.14/22050; | ||
double value = 0; | ||
for (int j = 0; j < 3; j++) | ||
{ | ||
value = 0; | ||
for (int i = 0; i < 44100; i++) | ||
{ | ||
validAudioData.push_back(sin(value)); | ||
value += step; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code duplication
modules/videoio/test/test_audio.cpp
Outdated
void comparisonAudio() | ||
{ | ||
for (unsigned int i = 0; i < audioData.size(); i++) | ||
{ | ||
EXPECT_LE(fabs(validAudioData[i] - audioData[i]), epsilon) << "sample index " << i; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code duplication
modules/videoio/test/test_audio.cpp
Outdated
EXPECT_LE(fabs(validAudioData[i] - audioData[i]), epsilon) << "sample index " << i; | ||
} | ||
} | ||
void comparisonVideo() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comparisonVideo => checkVideoFrames
modules/videoio/test/test_audio.cpp
Outdated
void getValidAudioData() | ||
{ | ||
const double step = 3.14/22050; | ||
double value = 0; | ||
for(int j = 0; j < 3; j++) | ||
{ | ||
value = 0; | ||
for(int i = 0; i < 44100; i++) | ||
{ | ||
validAudioData.push_back(sin(value)); | ||
value += step; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code duplication
samples/cpp/videocapture_audio.cpp
Outdated
int numberOfSamles = 0; | ||
for (auto item : audioData) | ||
numberOfSamles+=item.cols; | ||
cout << "Number of samples: " << numberOfSamles << endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move out post processing code from the for
loop.
Keep for
loops readable - minimal as possible.
cap.open(0, CAP_MSMF, params); | ||
if (!cap.isOpened()) | ||
{ | ||
cerr << "ERROR! Can't to open file" << endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file
there is no file
const double cvTickFreq = getTickFrequency(); | ||
int64 sysTimeCurr = getTickCount(); | ||
int64 sysTimePrev = sysTimeCurr; | ||
while ((sysTimeCurr-sysTimePrev)/cvTickFreq < 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to emit message that audio would be captured for the next 10 seconds.
To avoid confusion with program hang.
if (cap.grab()) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"else" handling is missing.
What if user disconnect microphone device?
} | ||
if (!audioFrame.empty()) | ||
{ | ||
audioData.push_back(audioFrame); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fzhar Audio writer is out of scope of this PR. It would be added later.
modules/videoio/src/cap_msmf.cpp
Outdated
try | ||
{ | ||
if (chunkLengthOfBytes < INT_MIN || chunkLengthOfBytes > INT_MAX) | ||
throw "The chunkLengthOfBytes is out of the allowed range"; | ||
} | ||
catch (const std::exception& e) | ||
{ | ||
CV_LOG_WARNING(NULL, "MSMF: Exception is raised: " << e.what()); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't work properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How I can fix that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look for similar code in the library. Use CV_Check.
modules/videoio/src/cap_msmf.cpp
Outdated
} | ||
copy(bufferAudioData.begin(), bufferAudioData.begin()+chunkLengthOfBytes, std::back_inserter(audioDataInUse)); | ||
bufferAudioData.erase(bufferAudioData.begin(), bufferAudioData.begin()+chunkLengthOfBytes); | ||
residualTime = (double)(bufferAudioData.size()/((captureAudioFormat.bit_per_sample/8)*captureAudioFormat.nChannels))/captureAudioFormat.nSamplesPerSec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
residualTime
Wrong logic is here.
.retrive()
must be idempotent implementation. It can be called many times with same parameters or not called at all.
It is wrong to modify residualTime
variable.
State must be changed during .grab()
call only.
modules/videoio/src/cap_msmf.cpp
Outdated
bool returnFlag = true; | ||
if (videoStream != -1) | ||
returnFlag &= grabVideoFrame(); | ||
if (audioStream != -1) | ||
returnFlag &= grabAudioFrame(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we fail to capture video, then there is no reason to capture audio. We already failed.
modules/videoio/src/cap_msmf.cpp
Outdated
0, // Flags. | ||
&streamIndex, // Receives the actual stream index. | ||
&flags, // Receives status flags. | ||
&sampleTime, // Receives the time stamp. | ||
&videoSample // Receives the sample or NULL. | ||
NULL, // Receives the time stamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why audio ignores timestamp?
This is why the current logic fails on sample_960x400_ocean_with_audio.mp4
Add debug dump of values from ReadSample:
CvCapture_MSMF::initStream Init stream 1 with MediaType (960x400 @ 23.976) MFVideoFormat_RGB32
CvCapture_MSMF::initStream Init stream 0 with MediaType (0x0 @ 1) ☺
CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=1668332
CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=213333
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=206349
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=419682
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379
...
CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2085415
CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=633061
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=846441
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213152
...
CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2502498
CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1059593
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1272972
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379
Audio stream starts 4+ frames before video. We need to capture all of this audio data and return with the first video frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why audio ignores timestamp?
This is why the current logic fails on
sample_960x400_ocean_with_audio.mp4
Add debug dump of values from ReadSample:
CvCapture_MSMF::initStream Init stream 1 with MediaType (960x400 @ 23.976) MFVideoFormat_RGB32 CvCapture_MSMF::initStream Init stream 0 with MediaType (0x0 @ 1) ☺ CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=1668332 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=213333 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=206349 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=419682 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 ... CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2085415 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=633061 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=846441 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213152 ... CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2502498 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1059593 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1272972 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379
Audio stream starts 4+ frames before video. We need to capture all of this audio data and return with the first video frame.
Audio ignores timestamp because there was no need to use it. I will fix it.
modules/videoio/src/cap_msmf.cpp
Outdated
bool CvCapture_MSMF::setTime(int numberFrame) | ||
{ | ||
if(videoStream == -1) | ||
return false; | ||
PROPVARIANT var; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should forbid seeking in case of Video+Audio capturing, except seek to beginning of the media file.
Other values are just not accurate even for Video-only cases.
The SetCurrentPosition method does not guarantee exact seeking. The accuracy of the seek depends on the media content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should forbid seeking in case of Video+Audio capturing, except seek to beginning of the media file.
Other values are just not accurate even for Video-only cases.
The SetCurrentPosition method does not guarantee exact seeking. The accuracy of the seek depends on the media content.
We can manually rewind to any frame by setting SetCurrentPosition to 0, call grab and release the frames until we reach the desired one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call grab and release the frames until we reach the desired one
Then the next step will be receiving a bug report about why seeking implementation is so slow.
If we want to implement feature, then we should it implemented in the best way. Otherwise we would get unresolvable threads like these: #9053
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should forbid seeking in case of Video+Audio capturing, except seek to beginning of the media file.
Then I can forbid setting CV_CAP_PROP_YPOS_FRAMES and CV_CAP_PROP_YPOS_MSC in case of Video+Audio capturing and allow setting CV_CAP_PROP_POS_AVI_RATIO for values 0 and 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except seek to beginning of the media file.
Seek to zero should be supported for all modes (this case is well defined and just works).
modules/videoio/test/test_audio.cpp
Outdated
if (!videoFrame.empty()) | ||
videoData.push_back(videoFrame); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
videoFrame.empty()
Must be always false on .grab()
success.
Current last frame synchronization logic doesn't work properly.
modules/videoio/test/test_audio.cpp
Outdated
|
||
const paramCombination mediaParams[] = | ||
{ | ||
paramCombination("mp4", 1, 0.15, CV_8UC3, 240, 320, 90, 30, 30., {"CAP_MSMF", cv::CAP_MSMF}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"CAP_MSMF", cv::CAP_MSMF
Which other tests uses the same scheme? Why we need string here?
modules/videoio/test/test_audio.cpp
Outdated
} | ||
ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) - (cap.get(CAP_PROP_POS_MSEC)/ 1000 + 1./fps) * samplePerSecond), audioSamplesTolerance); | ||
ASSERT_LT(abs(audioFrame.cols - samplesPerFrame), audioSamplesTolerance); | ||
ASSERT_EQ(CV_16SC1, audioFrame.type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type check is placed after data usage.
modules/videoio/test/test_audio.cpp
Outdated
} | ||
} | ||
ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) - (cap.get(CAP_PROP_POS_MSEC)/ 1000 + 1./fps) * samplePerSecond), audioSamplesTolerance); | ||
ASSERT_LT(abs(audioFrame.cols - samplesPerFrame), audioSamplesTolerance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
audioFrame.cols
must be same across channels.
audioFrame.cols
should not be checked for the first frame (or multiple no-audio frames in the beginning) and the last frame (due to the last frame synchronization, which reads audio till the end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
audioFrame.cols
must be same across channels.
Do I need to check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, test should check that
modules/videoio/test/test_audio.cpp
Outdated
audioData[nCh].push_back(f); | ||
} | ||
} | ||
ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) - (cap.get(CAP_PROP_POS_MSEC)/ 1000 + 1./fps) * samplePerSecond), audioSamplesTolerance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this check to work we need one more read-only property which contains timestamp shift between audio and video streams (of the first captured data)
timestamp shift should be in nanoseconds.
How the pipeline should behave when the audio stream is shorter than the video stream? that is, the audio has an offset at the beginning and may end earlier |
Last frames should return empty audio data. Audio data for the first frame is larger that for others (due to audio starts before video) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run test locally with
paramCombination("sample_960x400_ocean_with_audio.mp4", 2, 0.15, CV_8UC3, 400, 960, 1116, 30, 30., cv::CAP_MSMF)
Several test conditions still fail (besides of audio gold results).
modules/videoio/test/test_audio.cpp
Outdated
for (unsigned int nCh = 0; nCh < audioData.size(); nCh++) | ||
for (unsigned int i = 0; i < audioData[nCh].size(); i++) | ||
{ | ||
EXPECT_LE(fabs(validAudioData[nCh][i] - audioData[nCh][i]), epsilon) << "sample index " << i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if readFile()
fails or read more data then that code tried to perform out of buffer access.
nCh < audioData.size()
Checks constraints based on input data are logically incorrect. Gold values and test parameters must be used instead.
modules/videoio/test/test_audio.cpp
Outdated
if (nCh == 0) | ||
audioFrameCols = audioFrame.cols; | ||
else | ||
ASSERT_EQ(audioFrameCols, audioFrame.cols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<< nCh;
or use SCOPED_TRACE with nCh value.
modules/videoio/test/test_audio.cpp
Outdated
if (!audioFrame.empty()) | ||
if (frame != numberOfFrames-1) | ||
{ | ||
ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) + (cap.get(CAP_PROP_TIME_SHIFT_STREAMS)/ 1e6) * samplePerSecond - (cap.get(CAP_PROP_POS_MSEC)/ 1000 + 1./fps) * samplePerSecond), audioSamplesTolerance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is failed with:
paramCombination("sample_960x400_ocean_with_audio.mp4", 2, 0.15, CV_8UC3, 400, 960, 1116, 30, 30., cv::CAP_MSMF)
See audioSamplePos
comment above.
1./fps
This is wrong. Both positions must point to be beginning of the data. You just can't estimate the end for VFR videos.
Check must look like this:
EXPECT_NEAR(
cap.get(CAP_PROP_AUDIO_POS),
((cap.get(CAP_PROP_POS_MSEC) - cap.get(CAP_PROP_TIME_SHIFT_STREAMS) / 1e6) / 1000) * samplePerSecond,
audioSamplesTolerance)
<< cap.get(CAP_PROP_POS_MSEC)=" << cap.get(CAP_PROP_POS_MSEC);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check must look like this:
EXPECT_NEAR( cap.get(CAP_PROP_AUDIO_POS), ((cap.get(CAP_PROP_POS_MSEC) - cap.get(CAP_PROP_TIME_SHIFT_STREAMS) / 1e6) / 1000) * samplePerSecond, audioSamplesTolerance) << cap.get(CAP_PROP_POS_MSEC)=" << cap.get(CAP_PROP_POS_MSEC);
This is how the check should look, only for the first frame. After reading the first frame, the timestamps of audio and video will not need to align and take into account the offset
if (frame == 0)
{
EXPECT_NEAR(
cap.get(CAP_PROP_AUDIO_POS),
((cap.get(CAP_PROP_POS_MSEC) + cap.get(CAP_PROP_AUDIO_SHIFT_NSEC) / 1e6) / 1000) * samplePerSecond,
audioSamplesTolerance)
<< "CAP_PROP_POS_MSEC = " << cap.get(CAP_PROP_POS_MSEC);
}
else
{
EXPECT_NEAR(cap.get(CAP_PROP_AUDIO_POS), (cap.get(CAP_PROP_POS_MSEC) / 1000) * samplePerSecond, audioSamplesTolerance) << "CAP_PROP_POS_MSEC = " << cap.get(CAP_PROP_POS_MSEC);
EXPECT_LT(abs(audioFrame.cols - samplesPerFrame), audioSamplesTolerance);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, CAP_PROP_AUDIO_SHIFT_NSEC
must be applied for all frames.
frame == 0
check is EXPECT_EQ(0, cap.get(CAP_PROP_AUDIO_POS));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No,
CAP_PROP_AUDIO_SHIFT_NSEC
must be applied for all frames.
Calculations will be erroneous. From CAP_PROP_POS_MSEC, you need to subtract the audio offset relative to the zero of the media file, and not the time difference between the start of the audio stream and the video stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check is updated. Please fetch the last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update!
modules/videoio/src/cap_msmf.cpp
Outdated
nFrame++; | ||
usedVideoSample->GetSampleDuration(&videoSampleDuration); | ||
requiredAudioTime = usedVideoSampleTime + videoSampleDuration - allTime; | ||
allTime += requiredAudioTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allTime
unclear name which is used for audio only.
modules/videoio/src/cap_msmf.cpp
Outdated
if (residualTime*1e7 > requiredAudioTime) | ||
return true; | ||
while ((!vEOS) ? bufferedAudioDuration <= requiredAudioTime : !aEOS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
residualTime
This is some kind of duplicate for bufferedAudioDuration
(but with unclear name)
modules/videoio/src/cap_msmf.cpp
Outdated
if (returnFlag) | ||
if (!audioSamples.empty() || !bufferAudioData.empty() && aEOS) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to indent such huge code blocks:
if (!returnFlag)
{
CV_LOG_DEBUG(...);
return false;
}
... code block ...
modules/videoio/src/cap_msmf.cpp
Outdated
if (!audioSamples.back()) | ||
audioSamples.pop_back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just do not write into array directly through &audioSamples[numberOfSamples]
.
It is better to write into local variable and register it on success after all passed checks.
modules/videoio/src/cap_msmf.cpp
Outdated
sampleTime += frameStep; | ||
audioSamples[numberOfSamples]->GetSampleDuration(&audioSampleDuration); | ||
CV_LOG_DEBUG(NULL, "videoio(MSMF): got audio frame with timestamp=" << audioSampleTime << " duration=" << audioSampleDuration); | ||
bufferedAudioDuration += (LONGLONG)(audioSampleDuration + residualTime*1e7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
residualTime*1e7
Added multiple times in the loop (unexpected shift):
while ((!vEOS) ? bufferedAudioDuration <= requiredAudioTime : !aEOS)
modules/videoio/src/cap_msmf.cpp
Outdated
if (videoStream == -1) | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"while()" condition should handle this?
modules/videoio/test/test_audio.cpp
Outdated
void checkAudio() | ||
{ | ||
for (unsigned int nCh = 0; nCh < audioData.size(); nCh++) | ||
for (unsigned int i = 0; i < validAudioData[nCh].size(); i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
audioData[nCh].size()
check is missing
No need to have 2 identical checkAudio()
implementations.
modules/videoio/test/test_audio.cpp
Outdated
#if 0 | ||
// https://filesamples.com/samples/video/mp4/sample_960x400_ocean_with_audio.mp4 | ||
, paramCombination("sample_960x400_ocean_with_audio.mp4", 2, 0.15, CV_8UC3, 400, 960, 1116, 0, 30, 30., cv::CAP_MSMF) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case has audio-video desync starting from frame 666 which increases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case has audio-video desync starting from frame 666 which increases.
Starting from frame 666, the audio position begins to outpace the video by more than (1.0 / fps) * 0.3, because for each output of audio data, it is necessary that the data size in bytes be a multiple of the dimension of the matrix.
chunkLengthOfBytes +=
((int)(captureAudioFormat.bit_per_sample)/8* (int)captureAudioFormat.nChannels) - chunkLengthOfBytes %
((int)(captureAudioFormat.bit_per_sample)/8* (int)captureAudioFormat.nChannels);
For this reason, with each iteration, the gap between the video and audio positions will increase and on sufficiently long media files, desynchronization of positions will be observed, but the amount of output audio data with a frame will be unchanged. Confirmed by verification:
EXPECT_NEAR(audio Frame.cols, samplesPerFrame, audioSamplesTolerance);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessary to return chunks of audio of the same size.
but the amount of output audio data with a frame will be unchanged
This doesn't make any sense for videos with variable frame rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessary to return chunks of audio of the same size.
but the amount of output audio data with a frame will be unchanged
This doesn't make any sense for videos with variable frame rate.
Chunks of audio of the same size are returned only if the frame rate is constant. Chunks of audio are calculated based on the duration of the frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chunks of audio are calculated based on the duration of the frame
Chunks of audio should be calculated based on the timestamp of the next video frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done 👍
add audio support in cap_msmf * audio msmf * fixed warnings * minor fix * fixed SampleTime MSMF * minor fix, fixed audio test, retrieveAudioFrame * fixed warnings * impelemented sync audio and video stream with start offset * fixed error * fixed docs * fixed audio sample * CAP_PROP_AUDIO_POS, minor fixed * fixed warnings * videoio(MSMF): update audio test checks, add debug logging * fixed * fixed desynchronization of time positions, warnings * fixed warnings * videoio(audio): tune tests checks * videoio(audio): update properties description * build warnings Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
Merge with extra: opencv/opencv_extra#864
@alalek, @allnes, @fzhar
Issue #16394
I attach a link on opencv_extra's PR opencv/opencv_extra#864 .