You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hertasecurity edited this page Aug 21, 2020
·
3 revisions
GPU-NMS Benchmarking Framework
This repository holds the CUDA source code implementing the algorithm described in the paper "Work-Efficient Parallel Non-Maximum Suppression Kernels". The proposed NMS CUDA kernels are designed for GPU-only video processing pipelines, and should be executed after the inference of a convolutional neural network returning the coordinates of localized objects.
Build Instructions
Edit the Makefile and update the SM_ARCH and GPU_ARCH variables with the architecture matching your NVIDIA GPU platform. By default, the Makefile targets an NVIDIA Tesla T4 (sm_75 / compute_75), but it has been extensively tested on other platforms such as an NVIDIA Jetson TK1 (sm_32 / compute_32), TX1 (sm_53 / compute_53), TX2 (sm_62 / compute_62), and an NVIDIA GeForce GTX1060 (sm_61 / compute_61).
After having updated the Makefile, you can now compile and execute a test application that runs the NMS kernels on the GPU to merge the candidate windows of all detected objects.
CUDA Runtime Version 11000
Device 0# Tesla T4 [1.59 GHz - 40 Multiprocessors - Core sm_75 - 15109 MB]
Device 0# has been selected for CUDA computation
Detections read from input file (detections.txt): 2997
NMS-MAP elapsed time: 0.653 ms
NMS-REDUCE elapsed time: 0.139 ms
Detections after NMS: 145
Finally, execute the drawrectangles script to generate a PNG file (oscarsdets.png) containing the merged candidate windows:
$ ./drawrectangles output.txt
$ eog oscarsdets.png
Troubleshooting / FAQ
Q: The test application execution displays the message CUDA Error: no kernel image is available for execution on the device. What am I doing wrong?
A: This issue arises when the CUDA kernels have been compiled for an architecture that do not match the GPU in which the test application is executed. Please, update the SM_ARCH and GPU_ARCH variables in the Makefile with the sm_XX and compute_XX architecture matching your GPU platform. Finally, recompile the code and execute again the test application.