| CARVIEW |
Select Language
HTTP/2 200
date: Mon, 29 Dec 2025 11:45:34 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"077cefc9cdf3dde2b2f920fc5d30aa8b"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com github.githubassets.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com wss://alive-staging.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com marketplace-screenshots.githubusercontent.com/ copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com github.githubassets.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=TbBB8hx6b9qeRUKax7MjDUMyhzx6mLJIpE%2BXMPsjjPcT%2Bg0s4VdTEWPNH%2B6U0BgGjtuYvYPNC4vSEIiQPYnWDIGXEfo73Ww6%2FBrI7HfNBhxM9rKrevW6H7wm7sJ9xPbZjh6x1M%2FqDVd4GXDRwJokeIreFl4AKw23QhjxTeka7ug%2F5x%2Bud7HmI5R%2BtN0Q5rZQvmZzNhvyvic23cT8OgkC30%2BlfVEUTgB84Xg7ZVevaoSQHtdfkTUj2keZ5ljHBhW%2BSHbIganpJR8dIpsd5cBcYw%3D%3D--BIOLtbRdvq%2Bbcl1A--l2AqlIwdr8KBBHcC%2FgUaNA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.236452160.1767008733; Path=/; Domain=github.com; Expires=Tue, 29 Dec 2026 11:45:33 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Tue, 29 Dec 2026 11:45:33 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: DD78:446F3:600BA6F:7357F5F:695269DD
Releases · ModelCloud/GPTQModel · GitHub
17 Dec 11:28
16 Dec 10:13
Loading
16 Dec 04:11
Loading
15 Dec 10:35
Loading
15 Dec 08:27
Loading
12 Dec 10:04
Loading
09 Dec 11:53
Loading
15 Nov 07:58
Loading
09 Nov 02:28
Loading
02 Nov 17:14
Loading
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 144
Releases: ModelCloud/GPTQModel
Releases · ModelCloud/GPTQModel
GPT-QModel v5.6.12
1a19cd0
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Notable Changes:
uvcompat- Both
uvandpipinstall will now display ui progress for external wheel/depend downloads.
What's Changed
- [FIX] failed unittest by @ZX-ModelCloud in #2286
- fix wheel name mistaches with version name by @CSY-ModelCloud in #2288
- Setup download progress by @Qubitium in #2289
- Update latest news section in README.md by @Qubitium in #2290
Full Changelog: v5.6.10...v5.6.12
Assets 34
- sha256:e4ed4f51b1ac342fb309ec71371a7feea7b6d7228a20a58df5114e54cc2b0486124 MB
2025-12-17T12:54:06Z - sha256:602c7f828c40e476b3dc9b396306c9d68b208fa7a802c87abc00aed196958ac9124 MB
2025-12-17T12:53:48Z - sha256:ed5254619b4ede167d9b586e47d457e8ef9a00daa2496b8f8b24e2d6b64bf492125 MB
2025-12-17T12:54:12Z - sha256:fe570beef09aa359dc8ca96a5234c3e183b33ee3b8665c65e781d4d3b67b971e125 MB
2025-12-17T12:52:36Z - sha256:df1fc12b7c0b207a76e208426b131191b54a6621f55892412a6034401c38353f125 MB
2025-12-17T12:52:42Z - sha256:b6ec689dad018f86dffc813f25ea30488745496518c76d3865ef0c4410848959125 MB
2025-12-17T12:43:11Z - sha256:467cb3cd040dfbe4ead3345c7ba19703a6805c8139fee995511943f3a3b7ed24125 MB
2025-12-17T12:42:42Z - sha256:a9e1422003ada30afa6bd2a7f16697f5426cccf975cdf804b8bd8fcba384179c125 MB
2025-12-17T12:42:45Z - sha256:1a81ec47ea9940a04df5d566938ed8741f4411bb86a08ab7f9916447a7d02836125 MB
2025-12-17T12:41:42Z - sha256:a88a20d6e7a9bda8bcef138495d297f5fee6705e746e4f8be0b21fd3994400a2125 MB
2025-12-17T12:41:47Z -
2025-12-17T11:27:08Z -
2025-12-17T11:27:08Z - Loading
GPT-QModel v5.6.10
70a507d
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Notable Changes:
- Triton check by @Qubitium in #2274
- Fix bitblas support for gptq_v2 format by @xxxxyu in #2281
- Fix awq triton kernel has invalid properties by @Qubitium in #2279
What's Changed
- Add kernel selection log by @ZX-ModelCloud in #2275
- Update README.md by @Qubitium in #2276
- Update pypcre depend by @Qubitium in #2277
- Update version.py by @Qubitium in #2278
- Add macos unit tests by @CSY-ModelCloud in #2282
- Update README.md by @Qubitium in #2283
New Contributors
Full Changelog: v5.6.6...v5.6.10
Assets 34
GPT-QModel v5.6.8
711b214
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Notable Changes:
What's Changed
- Add kernel selection log by @ZX-ModelCloud in #2275
- Update README.md by @Qubitium in #2276
Full Changelog: v5.6.6...v5.6.8
Assets 34
v5.6.6
9a79b62
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Notable Changes:
- Use static cuda ctx for triton kernel launch by @Qubitium in #2269
- Remove random-word depend by @LRL2-ModelCloud in #2266
- Update PyPcre depend from 0.2.7 to 0.2.8 by @Qubitium in #2267
What's Changed
- Bump the github-actions group with 2 updates by @dependabot[bot] in #2265
- Update version.py by @Qubitium in #2268
- Ready 5.6.6 by @Qubitium in #2270
Full Changelog: v5.6.2...v5.6.6
Assets 34
GPT-QModel v5.6.4
61e5e7f
This commit was created on GitHub.com and signed with GitHub’s verified signature.
What's Changed
- Bump the github-actions group with 2 updates by @dependabot[bot] in #2265
- remove random-word depend by @LRL2-ModelCloud in #2266
- Update pypcre version from 0.2.7 to 0.2.8 by @Qubitium in #2267
- Update version.py by @Qubitium in #2268
Full Changelog: v5.6.2...v5.6.4
Assets 34
GPT-QModel v5.6.2
d97478f
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Notable Changes
- FIX JIT Pytorch extension
pack_cpu_extstall by @ZX-ModelCloud in #2248 - Refractor Kernel External Dependency Validation by @LRL2-ModelCloud in #2249
- FIX some models not honoring model.config.use_cache by force pass use_cache=false by @LRL2-ModelCloud in #2246
- FIX Incorrect Triton dequant_kernel for 3-bit GPTQ (INT3) leads to Triton compile error / wrong dequantization #2251 by
- Support llm-awq by @ZX-ModelCloud in #2252
What's Changed
- Update version.py by @Qubitium in #2247
- Update README.md by @davedgd in #2250
- [CI] add torch 2.9.1 by @CSY-ModelCloud in #2254
@KingdalfGoodman in #2258 - Update license declaration in pyproject.toml by @CSY-ModelCloud in #2259
- Modify setup by @Qubitium in #2260
- Add release notes for version 5.6.2 by @Qubitium in #2261
- fix test_quant_formats.py by @LRL2-ModelCloud in #2262
- [CI] mount dateset dir to /monster/data/model/dataset by @CSY-ModelCloud in #2263
- fix parsing args by @CSY-ModelCloud in #2264
New Contributors
- @KingdalfGoodman made their first contribution in #2258
Full Changelog: v5.6.0...v5.6.2
Assets 34
GPT-QModel v5.6.0
b63b373
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Notable Changes:
- HF Kernel for CPU: AMX, AVX2, AVX512 optimized by @jiqing-feng in #2232
- Fix: Resolve performance regression during initial forward pass with offload_to_disk by @avtc in #2239
- Auto module tree by @LRL2-ModelCloud in #2204
- Afmoe support by @LRL2-ModelCloud in #2243
- Add dots1 by @Qubitium in #2231
What's Changed
- Update description and code about GPTAQ in README.md by @wayneguow in #2202
- Update test cases for qwen2.5-vl and qwen3-vl by @wayneguow in #2203
- Optimize minimax m2 modelling forward pass by @avtc in #2176
- remove gemm ipex by @LRL2-ModelCloud in #2206
- Bump actions/checkout from 5 to 6 in the github-actions group by @dependabot[bot] in #2207
- Update device-smi dependency version to 0.5.2 by @Qubitium in #2208
- Fix loading an AWQ-quantized model with GPTQModel when it is not actu… by @LRL2-ModelCloud in #2209
- fix exllama v2 post init by @LRL2-ModelCloud in #2211
- [FIX] Add fallback for "module_dir" and "entry key" lookup by @ZX-ModelCloud in #2210
- Update unit_tests.yml by @Qubitium in #2213
- fix mps backend does not implement float64 by @Qubitium in #2216
- [FIX] _apply_quant() not being called with awq by @ZX-ModelCloud in #2218
- Fix AWQ Extension by @LRL2-ModelCloud in #2217
- Auto AWQ kernel selection for Transformers compat by @Qubitium in #2214
- Fix add bias for torch_fuse by @jiqing-feng in #2223
- [CI] Add torch_fused test with Bias by @ZX-ModelCloud in #2222
- [FIX] device_map with cpu only causing
CpuOffloadhooks to be injected by @ZX-ModelCloud in #2225 - fix awq apply_scale and apply_clip multi thread issue by @LRL2-ModelCloud in #2224
- Fix CI test not pasing by @Qubitium in #2226
- Monkeypatch lm-eval latest broken imports by @Qubitium in #2227
- make file can be pytest called by @CSY-ModelCloud in #2228
- CI Fix awq weight mean by @LRL2-ModelCloud in #2229
- fix pycharm auto imported wrong path by @CSY-ModelCloud in #2230
- [FIX] TorchFusedAwqQuantLinear selection by @ZX-ModelCloud in #2233
- [CI] update CI path by @CSY-ModelCloud in #2236
- [Model] Mistral3 support by @LRL2-ModelCloud in #2238
- Update setup.py by @Qubitium in #2240
- Increase MAX_JOBS from 4 to 8 in release.yml by @Qubitium in #2241
- [FIX] non-peristent buffer was saved incorrectly by @ZX-ModelCloud in #2242
New Contributors
- @wayneguow made their first contribution in #2202
Assets 34
GPT-QModel v5.4.2
6c3e279
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Notable Changes:
- Fix double fwd regression by @Qubitium in #2198
- Add cli: gptqmodel env by @ZX-ModelCloud in #2192
- [CI] compile wheel with python -m build by @CSY-ModelCloud in #2193
What's Changed
- Start v5.5.0 devel branch (odd version) by @Qubitium in #2191
- Update version from 5.5.0 to 5.4.2 patch release by @Qubitium in #2199
- [CI] copy wheel to local dir instead of using http server by @CSY-ModelCloud in #2200
Full Changelog: v5.4.0...v5.4.2
Assets 34
GPT-QModel v5.4.0
e0da12a
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Notable Changes:
- AWQ Torch Fused Kernel by @Qubitium in #2190
- Make torch fused op compilable by @jiqing-feng in #2182
- [FIX] AWQ MoE by @ZX-ModelCloud in #2171
- add :? capture only syntax by @Qubitium in #2173
What's Changed
- Update latest news section in README.md by @Qubitium in #2166
- run forward pass even for empty subset to produce correct layer outputs by @avtc in #2161
- Reduce AWQ memory usage by @Qubitium in #2167
- Awq update by @Qubitium in #2168
- Retry partial to to fix accelerate invalid argument for first moe layer (reapply) by @avtc in #2169
- Awq update by @Qubitium in #2172
- adjust retry partial.to by @avtc in #2175
- cleanup awq_get_modules_for_scaling() by @ZX-ModelCloud in #2179
- [FIX] qwen3 moe sparse moe block by @ZX-ModelCloud in #2184
- Add module convert by @LRL2-ModelCloud in #2183
- Cleanup by @Qubitium in #2185
- Update pypcre version to 0.2.5 by @LRL2-ModelCloud in #2186
- Update pypcre version to 0.2.5 by @Qubitium in #2189
- [FIX] version("triton") crash on torch+xpu by @ZX-ModelCloud in #2188
Full Changelog: v5.2.0...v5.4.0
Assets 34
1 person reacted
GPT-QModel v5.2.0
baf9674
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Notable Changes:
- Minimax M2, Granite Nano, Qwen3-VL, Brumpy model support
AWQquantization now out of beta and now fully integrated into life cycle- New
VramStrategy.Balancedproperty to spreadMoEmodules to different gpus - New pure torch AWQ kernel
- New
calibration_concat_separatorproperty - Fixed HF bug that did not save
mtplayers for GLM 4.5/4.6 (air) models. - Fixed multi-gpu cuda asserts due to stream/sync
What's Changed
- try not adding mem guards for marlin kernel launch protection by @Qubitium in https://github.com/ModelCloud/GPTQModel/*pull/2108
- MoE vram by @Qubitium in #2110
- Fix GLM 4.5/4.6 and AIr not saving mtp layer after save (HF bug) by @LRL2-ModelCloud in #2109
- torchao 0.14.1 update by @Qubitium in #2111
- Test refractor by @Qubitium in #2113
- Bump the github-actions group with 2 updates by @dependabot[bot] in #2120
- [FIX] xpu unit test by @ZX-ModelCloud in #2122
- modular by @Qubitium in #2123
- update scores by @Qubitium in #2124
- Fp8 dequant by @Qubitium in #2125
- Model dequant by @Qubitium in #2126
- Fp4 e2m1 by @Qubitium in #2127
- [FIX] ovis2, compatible with transformers v4.57.1 by @ZX-ModelCloud in #2129
- fix cols padding by @LRL2-ModelCloud in #2130
- [FIX] ovis_1_6 quantization by @ZX-ModelCloud in #2131
- Minimax m2 by @Qubitium in #2128
- Fix awq marlin kernel for bf16 by @Qubitium in #2135
- [FIX] incorrect AWQ NODES by @ZX-ModelCloud in #2133
- add support_offload_to_disk check by @LRL2-ModelCloud in #2134
- Add Awq torch kernel by @Qubitium in #2137
- Marin by @Qubitium in #2139
- Marin scores by @Qubitium in #2141
- Fix triton version detection in nogil patcher by @amd-vlarakic in #2144
- Fix qwen2 omni by @LRL2-ModelCloud in #2140
- [MODEL] Add GraniteMoEHybrid by @ZX-ModelCloud in #2142
- Fold AWQ into proper Looper/Layer/Subset Lifecycle by @Qubitium in #2138
- Refine GPT-QModel description in README by @Qubitium in #2145
- fix device_map by @LRL2-ModelCloud in #2146
- [MODEL] Add Qwen3-VL by @techshoww in #2136
- Add calibration_concat_separator by @Qubitium in #2148
- add test_qwen3_vl.py by @LRL2-ModelCloud in #2147
- Fix triton monkeypatch by @Qubitium in #2149
- [MODEL] Add Brumby by @Qubitium in #2150
- Dedup/Cleanup by @Qubitium in #2151
- Prep for 5.2 release by @Qubitium in #2152
- Dedup3 by @Qubitium in #2153
- add missing file by @Qubitium in #2154
- GPTAQ rename by @Qubitium in #2155
- fix ci test by @Qubitium in #2158
- fix setup license by @Qubitium in #2160
- FIx snapshot_download receiving unsupported kwargs by @Qubitium in #2162
- Retry partial.to to fix accelerate invalid argument error for first moe layer for >4 GPU setups by @avtc in #2163
- Comments + Sync by @Qubitium in #2164
- Stats/Logs by @Qubitium in #2165
New Contributors
- @amd-vlarakic made their first contribution in #2144
- @techshoww made their first contribution in #2136
Full Changelog: v5.0.0...v5.2.0
Assets 34
Previous Next
You can’t perform that action at this time.