CARVIEW |
Select Language
HTTP/2 200
date: Wed, 15 Oct 2025 01:55:56 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"b7d3938a0feef0590d2372ab0964b5ce"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com github.githubassets.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com wss://alive-staging.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com marketplace-screenshots.githubusercontent.com/ copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=HYGhPVD8BHeuZEUXjzbs7lvLlpF1qThHq8tDtZlrx%2B1lbnJzC5AMyvjcvdmzQzuBxx5M44l43d31ROkVUbqeWF1H3r9NlZs4SqrDKBpTnYqLo%2Fzz6ex%2Bl9yOq1lfBAkGFXJ0UAwF%2BsfvtOV0Tvdw25Ph5rjOYPp8zZVpdAAMn9dnPQ8NjOpiRQz84CwNO8Ksa9ZZxNmKIC5yYsKPGEXnA%2Bhaf1Yj7zRpL2E3wfBQD8Xe0moZ4LrSN25IVlTeeHZvyOkgHmmCnAR%2Bm%2BMqhkhNqg%3D%3D--kS4udlIhb6wnWAhR--C3khVRla%2BV4TFuG6Mm6a8A%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.32556839.1760493356; Path=/; Domain=github.com; Expires=Thu, 15 Oct 2026 01:55:56 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 15 Oct 2026 01:55:56 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: BD96:9B541:116F32E:14F9FF1:68EEFF2C
Releases Β· turboderp-org/exllamav3 Β· GitHub
13 Oct 21:42
09 Oct 22:12
Loading
28 Sep 15:44
Loading
17 Aug 14:58
Loading
19 Jul 02:05
Loading
15 Jun 05:05
Loading
31 May 15:56
Loading
12 May 16:57
Loading
09 May 18:38
Loading
Skip to content
Navigation Menu
{{ message }}
-
-
Notifications
You must be signed in to change notification settings - Fork 47
Releases: turboderp-org/exllamav3
Releases Β· turboderp-org/exllamav3
0.0.9
Compare
- Lock MCG and MUL1 multipliers, no longer flag as experimental
- Switch to MCG codebook by default to new models (use
--codebook 3inst
for previous default) - Add more calibration data
- Increase default calibration size to 250 rows (use
--cal_rows 100
for previous default) - Fix quantized cache for bsz > 1
- Fix kernel selection on A100
- A few more TP-related fixes
Full Changelog: v0.0.8...v0.0.9
Assets 19
- sha256:8dacaffd49e1784fd48881166fdcb09bebd94fa1327ab2d8db4516c51393c4a7140 MB
2025-10-13T21:59:35Z - sha256:1a529e4ff143d7416f876c16c5400feb3727599cdf425f0a099d0692cfdaa314127 MB
2025-10-13T22:04:16Z - sha256:f084a965966aa323f5847e6b876f67b34a0ebebe86b277e64880842d9c3fa863140 MB
2025-10-13T22:08:23Z - sha256:6a1c0208fdc0df630c591dfdf8a8aab4dc49c56eb6bd03b0796a144c3782beca127 MB
2025-10-13T22:03:47Z - sha256:4f9741bfd94576039ced76c502a237a47f14224bc3a09cf99f514fec720be135140 MB
2025-10-13T22:08:08Z - sha256:d233d36ef07cc873c4e41e3357b9ce740466398f2b26f3176515c5ce81b30c73127 MB
2025-10-13T22:03:07Z - sha256:3cb774656c545c917e04e72f61ea7f33b81e5b34177246b314a6e40300316384140 MB
2025-10-13T22:09:27Z - sha256:bacd975ebde23cc671f7d4fbf99478e9d639d6085438f3749e3bef274433e11d127 MB
2025-10-13T22:04:11Z - sha256:8226fb6ad16396bddc5691c5ebc7eb782001cd31c8a8315bca6334f8e26d782a140 MB
2025-10-13T22:07:47Z - sha256:2c3ad0e6ed25fa2a2974b80416b0a040fc5c936fbbd2c26b231a21449c7f0ca6127 MB
2025-10-13T23:00:17Z -
2025-10-13T21:33:38Z -
2025-10-13T21:33:38Z - Loading
2 people reacted
0.0.8
Compare
- New GEMM kernel tuning scheme
- Fix banned strings regression
- Fix some memory leaks
- Fix potential stack overflow in cache defrag
Full Changelog: v0.0.7...v0.0.8
Assets 19
2 people reacted
0.0.7
Compare
- Support SeedOssForCausalLM
- Support ApertusForCausalLM
- Support Qwen3NextForCausalLMΒΉ
- Reduced CPU overhead
- Fix support for non-AVX2 CPUs
- Optimized GEMM kernels
- Faster quantization, especially on Blackwell
- Quant optimizer utils
- Much lower overhead from quantized cache
- Tensor split option for MoE layers with large experts
- Add recurrent model support to generator
- Generator now allows allocating pages on the fly
- Many more improvements and bugfixes
ΒΉ Qwen3-Next currently requires Triton and Flash Linear Attention. causal-conv1d is recommended but not required. Triton-free implementation is in the works for v0.0.8.
Full Changelog: v0.0.6...v0.0.7
Assets 19
7 people reacted
0.0.6
Compare
- Add tensor-parallel mode
- Add support for Arcee achitecture
- Add support for GLM4 achitecture (GLM4.5, GLM4.5-Air)
- Fix CPU bottleneck in model loader
- Reduce VRAM usage during quantization
- Fused MoE routing kernels
- Various bugfixes
- QoL improvements
Full Changelog: v0.0.5...v0.0.6
Assets 19
7 people reacted
0.0.5
Compare
- Add filter interface
- Add Formatron support (JSON/KBNF grammar etc.)
- Support Ernie 4.5 architecture
- Support SmolLM3 architecture
- Support Exaone4 architecture
- Improve Cohere2 support (fixes support for Command-A)
- Fix compatibility with certain Pixtral preprocessors
- Fix excessive virtual memory usage with large generator queues on Windows
- Add facilities for manually mixing/optimizing quants
- Add MMLU eval script
- Various other fixes, improvements and optimizations
Full Changelog: v0.0.4...v0.0.5
Assets 19
0.0.4
Compare
- Vision support
- Support
MiMoForCausalLM
(no MTP) - Support
Gemma3ForConditionalGeneration
- Support
Gemma3ForCausalLM
- Support
Mistral3ForConditionalGeneration
(Mistral 3.1) - Support
Dots1ForCausalLM
(dots.llm1) - Add Transformers integration
- Faster tokenizer initialization
- Various fixes
Full Changelog: v0.0.3...v0.0.4
Assets 11
3 people reacted
0.0.3
Compare
- Support Mixtral architecture
- Support Gemma3 architecture (text only)
- Fixed support for Command-R+
- Return probs and top probs from generator
- Defrag for generator
- Many bugfixes
- Many performance improvements
- A bunch of new eval and test functionality
Full Changelog: v0.0.2...v0.0.3
Assets 11
3 people reacted
0.0.2
a905cff
This commit was created on GitHub.com and signed with GitHubβs verified signature.
Compare
- Adds Qwen3MoE support
- Various bugfixes and optimizations
Full Changelog: v0.0.1...v0.0.2
Assets 11
6 people reacted
0.0.1
Compare
Actions: Add build action Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Assets 11
9 people reacted
You canβt perform that action at this time.