CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Initial support Blackwell GPU arch #26820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
10.0 blackwell b100/b200 12.0 blackwell rtx50
cc @cudawarped |
@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards. Isn't Blackwell compute capability 10.0, where does 12.0 come from? |
Hello, |
Hello, also thor is 10.1 capability. |
@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0. Note: If you add 10.1 for Thor you also want to update this filter opencv/cmake/OpenCVDetectCUDAUtils.cmake Line 274 in ea023b7
|
But why you remove 12.0? |
Well, today NDA is removed |
I added support on pytorch, xla etc |
Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?
Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations of CUDA_ARCH_BIN and CUDA_ARCH_PTX. |
more references: |
@johnnynunez 🤯 Nvidia is going up 3 compute capabilities with a single generation, that's going to be really confusing considering they previously did a compute version per generation. What is the output from
If it says
Do you think its possible that the driver is outputing the wrong info? @asmorkalov Either way I would suggest it would be better to wait until more info is available before merging this PR. |
It's okay, flash attention v4 is coming for blackwell also. They have 100 and 120. |
I share you in the following hours because I'm not at home. But we have press release driver 571.86 whl |
@cudawarped Thanks a lot for the analysis. I'll review hardware specs and return back soon. |
|
new drivers are showing cuda 12.8 and same 12.0 codename |
more references: 10.0 b100 b200 |
@johnnynunez I still suggest we wait I can't see any downside with OpenCV because the compute capability can be manually selected. I still wouldn't be suprised if the 12 is the version of the CUDA toolkit due to the returned compute capability not being valid because the CUDA toolkit used by pytorch and used to compile nvidia-smi pre-date compute capabilities >= 9.0. |
yeah! Totally agree |
@johnnynunez Compute capability 12 looks to be official. Consumer cards look to have less resident threads per SM. |
@asmorkalov Builds on both Windows 11 and Ubuntu 22.04 (WSL) with CUDA Toolkit 12.8 using default architectures selection (
and
|
Yeah I compiled it pytorch, xformers, etc and opencv with my rtx5090 and it works. I just couldn't comment on anything because of NDA. But it was lifted yesterday |
@asmorkalov feel free to merge! thanks |
Initial support Blackwell GPU arch opencv#26820 10.0 blackwell b100/b200 12.0 blackwell rtx50 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
10.0 blackwell b100/b200
12.0 blackwell rtx50
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.