CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
[G-API] Support postprocessing for not argmaxed outputs #20476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[G-API] Support postprocessing for not argmaxed outputs #20476
Conversation
* Add static_cast to uint8_t
@dmatveev Could you have a look ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if the existing case is not broken with this change.
void classesToColors(const cv::Mat &out_blob, | ||
cv::Mat &mask_img) { | ||
const int H = out_blob.size[0]; | ||
const int W = out_blob.size[1]; | ||
|
||
mask_img.create(H, W, CV_8UC3); | ||
GAPI_Assert(out_blob.type() == CV_8UC1); | ||
const uint8_t* const classes = out_blob.ptr<uint8_t>(); | ||
|
||
for (int rowId = 0; rowId < H; ++rowId) { | ||
for (int colId = 0; colId < W; ++colId) { | ||
uint8_t class_id = classes[rowId * W + colId]; | ||
mask_img.at<cv::Vec3b>(rowId, colId) = | ||
class_id < colors.size() | ||
? colors[class_id] | ||
: cv::Vec3b{0, 0, 0}; // NB: sample supports 20 classes | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be expressed with our graph operators? Just wondering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean call this function inside the user kernel ? Or express this algo by using already existing operations ?
cv::resize(mask_img, out, in.size()); | ||
const float blending = 0.3f; | ||
out = in * blending + out * (1 - blending); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be moved on the graph level, too? Not critical to do it right now but worth considering for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the graph level cv::Size
parameter is unknown, isn't it ?
It's obviously can be custom resize operation
// NB: If output has more than single plane, it contains probabilities | ||
// otherwise class id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this robust enough? Maybe explicit enum flag is better? I just don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by enum flag ? In that case you need to match model name with postprocessing enum flag, right ?
I don't think that it's a great solution, just tried not to overdesign it.
@alalek Can it be merged ? |
…amvid-0001-segm-sample [G-API] Support postprocessing for not argmaxed outputs * Support postprocessing for not argmaxed outputs * Fix typo * Add assert * Remove static cast * CamelCast to snake_case * Fix windows warning * Add static_cast to uint8_t * Add const to variables
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.
Overview
Some semantic segmentation networks such as unet-camvid-0001 from OMZ produce multi-plane output (1 x num_classesx H x W). In that case need to perform argmax operation for every pixel through channel plane in order to convert output to 1 x 1 x H x W representation where every pixel is class id.
Build configuration