CARVIEW |
Select Language
HTTP/2 200
date: Tue, 29 Jul 2025 13:03:47 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"1ecd7b60e5a178330b0f6e30b4f06208"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=I%2FahVGmtUl981fPdeEb4zxueIoV417W5h0vS3FAmfmq13K0V9nNCcnvugNWGLZ9JuYkBY9f%2FS5H28ojM2N89PzQSYIzsx%2F71DB79jnYstzs9ulE6YS3cc7sGesm8Smsj2KW7bFYqFrULG8NuhJfMfQWpr9piNlFLrMcad4kpiJtvu4R2y5tT7a3J7p3a%2BlF6WgbDMfOQ5e6cUX6qMQOapbU439uQAX5TtmTMd014o06q%2F9SoZIdma1SrHlRKCDX9VwN3AYZtgqcgLKtnPGgJWQ%3D%3D--Nu7pngQwJjd1X5C0--Ju5EKPgijRY6bfElsQxtkQ%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1791997077.1753794226; Path=/; Domain=github.com; Expires=Wed, 29 Jul 2026 13:03:46 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Wed, 29 Jul 2026 13:03:46 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: D312:3CFA5:B4D76B:D6C23B:6888C6B2
Release Version 0.8.0 nvFatbin , CUDA 12.x features, sync-async op unification, static targets etc. Β· eyalroz/cuda-api-wrappers Β· GitHub
Support for the
More
Loading
Skip to content
Navigation Menu
{{ message }}
-
-
Notifications
You must be signed in to change notification settings - Fork 86
Version 0.8.0 nvFatbin , CUDA 12.x features, sync-async op unification, static targets etc.
Compare
Changes since v0.7.1:
Support for the nvFatbin
library (#681)
- The API wrappers now support NVIDIA's "fat binary" file format creation/marshalling library, nvFatbin. It is supported via a
cuda::fatbin_builder_t
class: One creates a builder, adds various fragments of fatbin-contained content (cubin, PTX, LTO IR etc.), then finally uses thebuild()
orbuild_at()
method to obtain the completed, final, fatbin file data, in a region of memory. - The project's CMake now exports a new target,
cuda-api-wrappers::fatbin
, which one should depend on when actually using the builder. - NVIDIA has not fully documented this library, so some functionality is not fully articulated, and some is only partially supported (specifically, passing extra options when adding LTO IR or PTX)
Support for more CUDA 12.x features
- #669 : Can now obtain the kernels available in a given
cuda::module_t
, with the methodunique_span<kernel_t> get_kernels() const
. - #670 : Can now obtain a kernel's name and the module containing it via the kernel's handle; but - only the mangled kernel name is accessible, so giving that an appropriate method name:
kernel_t::mangled_name()
(regards #674) - #675 : Can now query CUDA's module loading mode (lazy or eager)
(Note these features are not accessible if you're using the wrappers with CUDA 11.x)
More unique_span
class changes
Like a recently-cut gem, which one slowly polishes until it gains its proper shine... we had some work on unique_span in version 0.7.1 as well, and it continues in this version:
- #678 : The deleter is now instance-specific, so it is possible to allocate in more than one way depending even on the span size - and also have the use of such unique-spans decoupled from the allocation decisions. Also, the deleter takes a span, not just a pointer, so it can make decisions based on the allocation size.
- #665 :
- Simplified the
swap()
implementation - Removed some redundant code
- Shortened some code
- Can now properly convert from a span of
T
to a span ofconst T
. - Neither
release()
, nor our move construction, can benoexcept
- removed that marking based only on optimism
- Simplified the
optional_ref
& partial unification of async and non-async memory operations
- #691 : Added an
optional_ref
class, for passing optional arguments which are references. See this blog post by Foonathan about the problems of putting references in C++ optional's. - #689 : memory-related operations which had a
cuda::memory::foo()
andcuda::memory::async::foo()
variants - now have a single variant,cuda::memory::foo()
, which takes an extraoptional_ref<stream_t>
parameter: When it's not set, it's a synchronous operation; when it is set - the operation is asynchronous and scheduled on the stream. (But note the "fine print" w.r.t. synchronous and asynchronous CUDA operations in the Runtime and Driver API reference documents.) - #688 : Can now asynchronously copying 2D and 3D data using "copy parameters" structures
- #687 : The synchronous and asynchronous variants of
copy_single()
had disagreed - one took a pointer, the other a reference. With their unification, they now agree (and take a pointer).
Bug fixes
Poor man's optional class
- #664, #666 : Tweaked the class to avoid build failure in MSVC
- #676 :
value_or()
now returns a value... - #682 :
value_or()
is now const
In example programs
- #672 : The simpleCudaGraphs example program was crashing due to a gratuitous setting of launch config settings
- #673 : Potential use-before-initialization in the simpleIPC exampl;e
Other changes
Build mechanism
- #699 : Now exposing targets with a
_static
suffix, which in turn depend on the static versions of CUDA libraries, when those are available. For example, we now have bothcuda-api-wrappers::rtc
andcuda_api_wrappers::rtc_static
- #694 : Now properly building fatbin files on systems with multiple GPUs of different compute capabilities
In the wrapper APIs themselves
- #667 : Some dimension-classes methods are missing noexcept designators
- #671 : A bit of macro renaming to avoid clashing with other libraries
- #684 : Taking more linear sizes as
size_t
's inlaunch_config_builder_t
's methods - so as to prevent narrowing-cast warnings and checking limits ourselves. - #686 : When loading a kernel from a library, one can now specify which context to obtain the kernel.
- #698 : Add a shortcut function for getting the default device:
cuda::device::default()
.
In example programs
Assets 2
1 person reacted
0
Join discussion
You canβt perform that action at this time.