Home

GPU-NMS Benchmarking Framework

This repository holds the CUDA source code implementing the algorithm described in the paper "Work-Efficient Parallel Non-Maximum Suppression Kernels". The proposed NMS CUDA kernels are designed for GPU-only video processing pipelines, and should be executed after the inference of a convolutional neural network returning the coordinates of localized objects.

Build Instructions

Edit the Makefile and update the SM_ARCH and GPU_ARCH variables with the architecture matching your NVIDIA GPU platform. By default, the Makefile targets an NVIDIA Tesla T4 (sm_75 / compute_75), but it has been extensively tested on other platforms such as an NVIDIA Jetson TK1 (sm_32 / compute_32), TX1 (sm_53 / compute_53), TX2 (sm_62 / compute_62), and an NVIDIA GeForce GTX1060 (sm_61 / compute_61).

After having updated the Makefile, you can now compile and execute a test application that runs the NMS kernels on the GPU to merge the candidate windows of all detected objects.

$ make

nvcc -o nms.o -c nms.cu -O3 -gencode=arch=compute_75,code=sm_75
gcc -I/usr/local/cuda/include -onmstest nmstest.o nms.o -L/usr/local/cuda/lib64 -lcudart -lcuda

$ ./nmstest detections.txt output.txt

CUDA Runtime Version 11000
Device 0# Tesla T4	 [1.59 GHz - 40 Multiprocessors - Core sm_75 - 15109 MB]
Device 0# has been selected for CUDA computation
Detections read from input file (detections.txt): 2997
NMS-MAP elapsed time: 0.653 ms
NMS-REDUCE elapsed time: 0.139 ms
Detections after NMS: 145

Finally, execute the drawrectangles script to generate a PNG file (oscarsdets.png) containing the merged candidate windows:

$ ./drawrectangles output.txt

$ eog oscarsdets.png

Troubleshooting / FAQ

Q: The test application execution displays the message CUDA Error: no kernel image is available for execution on the device. What am I doing wrong?

A: This issue arises when the CUDA kernels have been compiled for an architecture that do not match the GPU in which the test application is executed. Please, update the SM_ARCH and GPU_ARCH variables in the Makefile with the sm_XX and compute_XX architecture matching your GPU platform. Finally, recompile the code and execute again the test application.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

GPU-NMS Benchmarking Framework

Build Instructions

Troubleshooting / FAQ

Clone this wiki locally