CARVIEW |
Select Language
HTTP/2 200
date: Thu, 24 Jul 2025 16:48:18 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"8a93269482cbc77c1b2dd9962ffad1cc"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=q3Q0JgXzwUxiwbGFLwBpwdedeW9TfSG6lz2jEVTJtu4rQ%2FXf5C1Uv3trzGqHyZACr8lKpKWpq8DmrFRfXEu4qWdiAYYBfzYnetW8iaxl09fr5H%2Fr6cN4rrHUDp1drZGhrkYJGn77q7O9IVOF%2BOMFnJIe3dyY88968rTryDhXlrT9MHsNg1puJGkt4FOZECRZ%2FvQ04oRe9lhoRHnsb4VmTjXS7Q5QPDxa%2Fvl0v%2F%2Bd7NOi3bdyKPmloQAw%2BpqhodCQxI0GgwFNFtKRrgp4OaPj2A%3D%3D--S534bPnHLqyxv9fw--umw4jpdX%2By99CiuhPYB8FQ%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1591307271.1753375697; Path=/; Domain=github.com; Expires=Fri, 24 Jul 2026 16:48:17 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Fri, 24 Jul 2026 16:48:17 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: DB38:B426E:1E039:239F6:688263D1
Releases · CNugteren/CLBlast · GitHub
13 Jun 17:50
Loading
09 Feb 20:40
Loading
09 Jul 09:30
Loading
21 May 19:22
Loading
29 Sep 18:46
Loading
20 Jan 13:22
Loading
18 Feb 09:38
Loading
04 Dec 21:10
Loading
14 Jul 10:30
Loading
03 Jun 11:27
Loading
Skip to content
Navigation Menu
{{ message }}
-
-
Notifications
You must be signed in to change notification settings - Fork 208
Releases: CNugteren/CLBlast
Releases · CNugteren/CLBlast
CLBlast 1.6.3
2a08197
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
CLBlast version 1.6.3. Changes since previous release (version 1.6.2):
- Fixed a bug in the GEMMK=1 kernel (with 2D register tiling) when MWG!=NWG
- CMake fixes for older versions and for the CUDA backend
- Added tuned parameters for many devices (see doc/tuning.md)
Assets 5
CLBlast 1.6.2
faa2109
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
CLBlast version 1.6.2. Changes since previous release (version 1.6.1):
- Fix a bug in the pre-processor that would cause issues on Arm GPUs
- Fix DLL install directory in mingw
- Modifications to the Python bindings (pyclblast)
- Convert float scalar values to cl_half for fp16 routines
- Amax/amin, max/min routines accept unsigned integer buffers for index
- Switch to pyproject.toml file for installing Python bindings
- Build Python bindings using Cmake, adding Windows support
- Generator script now always use LF endings, independent of the platform
- Added tuned parameters for many devices (see doc/tuning.md)
Assets 5
CLBlast 1.6.1
e3ce21b
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Compare
CLBlast version 1.6.1. Changes since previous release (version 1.6.0):
- Fix pointer error in pyclblast on ARM
- Fix a multithreading bug related to storing objects in the cache
- Added tuned parameters for many devices (see doc/tuning.md)
Assets 5
CLBlast 1.6.0
b0b3028
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Compare
CLBlast version 1.6.0. Changes since previous release (version 1.5.3):
- Improved performance on Qualcomm Adreno GPUs:
- Unique database entries for specific Adreno devices
- Toggle OpenCL kernel compilation options for Adreno
- New preprocessor directive RELAX_WORKGROUP_SIZE
- Fixed a bug in handling of #undef in CLBlast loop unrolling and array-to-register mapping functions
- Fixed a bug in XAMAX/XAMIN routines related to inadvertently including the increment and offset in the result
- Fixed a bug in XAMAX/XAMIN routines that would cause only the real part of a complex number to be taken into account
- Fixed a bug that caused tests to not properly do integer-output testing (for XAMAX/XAMIN)
- Fixes a minor issue with the expected input buffer size in the TRMV/TBMV/TPMV/TRSV routines
- Fixes an issue with crashes on Android related to calling clReleaseProgram
- Fixes two small issues in the plotting script
- Fixed a documentation bug in the 'ld' requirements
- Enabled Github Actions CI builds for testing and releasing
- Various minor fixes and enhancements
- Added tuned parameters for various devices (see doc/tuning.md)
Assets 5
4 people reacted
CLBlast 1.5.3
d55840e
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Compare
CLBlast version 1.5.3. Changes since previous release (version 1.5.2):
- Fix a correctness issue with DGEMM on SM 7.5 Turing GPUs
- Update cl.hpp to the new opencl.hpp header in the samples
- Changed the complex sum routine to return the complex sum instead of the absolute complex sum.
- Various minor fixes and enhancements
- Added tuned parameters for various devices (see doc/tuning.md)
Assets 4
2 people reacted
CLBlast 1.5.2
Compare
CLBlast version 1.5.2. Changes since previous release (version 1.5.1):
- Changed XAMAX/XAMIN to more likely return first rather than last min/max index, updated API docs
- Added batched routines to pyclblast
- Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering
- Several small improvements to the benchmark script (thanks to 'baryluk')
- Fixed a bug in the caching when using a context with multiple devices
- Fixed a bug in the tuners related to global workgroup size not being a multiple of the local
- Various minor fixes and enhancements
- Added tuned parameters for various devices (see doc/tuning.md)
Assets 4
CLBlast 1.5.1
Compare
CLBlast version 1.5.1. Changes since previous release (version 1.5.0):
- Implemented single-kernel version of convolution as GEMM
- Now catches all exceptions thrown by the tuners
- Fixed a bug in ISAMIN kernel
- Fixed an out-of-bounds read/write in the XHAD routine (thanks to etomzak)
- Various minor fixes and enhancements
- Added tuned parameters for various devices (see doc/tuning.md)
Assets 4
CLBlast 1.5.0
Compare
CLBlast version 1.5.0. Changes since previous release (version 1.4.1):
- Added support for shuffle instructions for NVIDIA GPUs (thanks to 'tyler-utah')
- Added an option to compile the Netlib API with static OpenCL device and context (-DNETLIB_PERSISTENT_OPENCL=ON)
- Added a FAQ page to the documentation
- The tuners now check beforehand on invalid local thread sizes and skip those completely
- Made the tuning API (OverrideParameters) more flexible, disregarding superfluous parameters
- Fixed an issue with conjugate transpose not being executed in certain cases for a.o. XOMATCOPY
- Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel
- Fixed an issue with the preprocessor and the new GEMMK == 1 kernel
- Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel
- Fixed an issue for certain parameters for AXPY's 'XaxpyFaster' kernel
- Various minor fixes and enhancements
- Added non-BLAS routines:
- SCONVGEMM/DCONVGEMM/HCONVGEMM (convolution as im2col followed by batched GEMM)
- SCOL2IM/DCOL2IM/CCOL2IM/ZCOL2IM/HCOL2IM (col2im transform as used in machine learning)
Assets 4
CLBlast 1.4.1
Compare
CLBlast version 1.4.1 (bugfix release). Changes since previous release (version 1.4.0):
- Fixed an access violation under Windows upon releasing the OpenCL program when the driver is already unloaded
- Fixed an issue with double cl_program release in the CLBlast caching system
- Added tuned parameters for various devices (see doc/tuning.md)
Assets 4
CLBlast 1.4.0
Compare
CLBlast version 1.4.0. Changes since previous release (version 1.3.0):
- Added Python interface to CLBlast 'PyCLBlast'
- Added CLBlast to Ubuntu PPA and macOS Homebrew package managers
- Added an API to run the tuners programmatically without any I/O
- Improved the performance potential by adding a second tunable GEMM kernel with 2D register tiling
- Added support for Intel specific subgroup shuffling extensions for faster GEMM on Intel GPUs
- Re-added a local memory size constraint to the tuners
- The routine tuners now automatically pick up tuning results from disk from the kernel tuners
- Updated and reorganised the CLBlast documentation
- Added a 'canary' region to check for overflows in the tuner and tests (inspired by clARMOR)
- Added an option to test against and compare performance with Intel's MKL
- Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
- Fixed incorrect releasing of the OpenCL program resulting in segfaults / access violations
- Various minor fixes and enhancements
- Added tuned parameters for various devices (see doc/tuning.md)
- Added non-BLAS level-1 routines:
- SHAD/DHAD/CHAD/ZHAD/HHAD (Hadamard element-wise vector-vector product)
Assets 4
Previous Next
You can’t perform that action at this time.