Notable Changes:

@ZX-ModelCloud

Notable Changes:

uv compat
Both uv and pip install will now display ui progress for external wheel/depend downloads.

What's Changed

[FIX] failed unittest by @ZX-ModelCloud in #2286
fix wheel name mistaches with version name by @CSY-ModelCloud in #2288
Setup download progress by @Qubitium in #2289
Update latest news section in README.md by @Qubitium in #2290

Full Changelog: v5.6.10...v5.6.12

@Qubitium

Notable Changes:

Triton check by @Qubitium in #2274
Fix bitblas support for gptq_v2 format by @xxxxyu in #2281
Fix awq triton kernel has invalid properties by @Qubitium in #2279

What's Changed

Add kernel selection log by @ZX-ModelCloud in #2275
Update README.md by @Qubitium in #2276
Update pypcre depend by @Qubitium in #2277
Update version.py by @Qubitium in #2278
Add macos unit tests by @CSY-ModelCloud in #2282
Update README.md by @Qubitium in #2283

New Contributors

@xxxxyu made their first contribution in #2281

Full Changelog: v5.6.6...v5.6.10

@Qubitium

Notable Changes:

Fix Triton check/import by @Qubitium in #2274

What's Changed

Add kernel selection log by @ZX-ModelCloud in #2275
Update README.md by @Qubitium in #2276

Full Changelog: v5.6.6...v5.6.8

@Qubitium

Notable Changes:

Use static cuda ctx for triton kernel launch by @Qubitium in #2269
Remove random-word depend by @LRL2-ModelCloud in #2266
Update PyPcre depend from 0.2.7 to 0.2.8 by @Qubitium in #2267

What's Changed

Bump the github-actions group with 2 updates by @dependabot[bot] in #2265
Update version.py by @Qubitium in #2268
Ready 5.6.6 by @Qubitium in #2270

Full Changelog: v5.6.2...v5.6.6

@LRL2-ModelCloud

What's Changed

Bump the github-actions group with 2 updates by @dependabot[bot] in #2265
remove random-word depend by @LRL2-ModelCloud in #2266
Update pypcre version from 0.2.7 to 0.2.8 by @Qubitium in #2267
Update version.py by @Qubitium in #2268

Full Changelog: v5.6.2...v5.6.4

@ZX-ModelCloud

Notable Changes

FIX JIT Pytorch extension pack_cpu_ext stall by @ZX-ModelCloud in #2248
Refractor Kernel External Dependency Validation by @LRL2-ModelCloud in #2249
FIX some models not honoring model.config.use_cache by force pass use_cache=false by @LRL2-ModelCloud in #2246
FIX Incorrect Triton dequant_kernel for 3-bit GPTQ (INT3) leads to Triton compile error / wrong dequantization #2251 by
Support llm-awq by @ZX-ModelCloud in #2252

What's Changed

Update version.py by @Qubitium in #2247
Update README.md by @davedgd in #2250
[CI] add torch 2.9.1 by @CSY-ModelCloud in #2254
@KingdalfGoodman in #2258
Update license declaration in pyproject.toml by @CSY-ModelCloud in #2259
Modify setup by @Qubitium in #2260
Add release notes for version 5.6.2 by @Qubitium in #2261
fix test_quant_formats.py by @LRL2-ModelCloud in #2262
[CI] mount dateset dir to /monster/data/model/dataset by @CSY-ModelCloud in #2263
fix parsing args by @CSY-ModelCloud in #2264

New Contributors

@KingdalfGoodman made their first contribution in #2258

Full Changelog: v5.6.0...v5.6.2

@jiqing-feng

Notable Changes:

HF Kernel for CPU: AMX, AVX2, AVX512 optimized by @jiqing-feng in #2232
Fix: Resolve performance regression during initial forward pass with offload_to_disk by @avtc in #2239
Auto module tree by @LRL2-ModelCloud in #2204
Afmoe support by @LRL2-ModelCloud in #2243
Add dots1 by @Qubitium in #2231

What's Changed

Update description and code about GPTAQ in README.md by @wayneguow in #2202
Update test cases for qwen2.5-vl and qwen3-vl by @wayneguow in #2203
Optimize minimax m2 modelling forward pass by @avtc in #2176
remove gemm ipex by @LRL2-ModelCloud in #2206
Bump actions/checkout from 5 to 6 in the github-actions group by @dependabot[bot] in #2207
Update device-smi dependency version to 0.5.2 by @Qubitium in #2208
Fix loading an AWQ-quantized model with GPTQModel when it is not actu… by @LRL2-ModelCloud in #2209
fix exllama v2 post init by @LRL2-ModelCloud in #2211
[FIX] Add fallback for "module_dir" and "entry key" lookup by @ZX-ModelCloud in #2210
Update unit_tests.yml by @Qubitium in #2213
fix mps backend does not implement float64 by @Qubitium in #2216
[FIX] _apply_quant() not being called with awq by @ZX-ModelCloud in #2218
Fix AWQ Extension by @LRL2-ModelCloud in #2217
Auto AWQ kernel selection for Transformers compat by @Qubitium in #2214
Fix add bias for torch_fuse by @jiqing-feng in #2223
[CI] Add torch_fused test with Bias by @ZX-ModelCloud in #2222
[FIX] device_map with cpu only causing CpuOffload hooks to be injected by @ZX-ModelCloud in #2225
fix awq apply_scale and apply_clip multi thread issue by @LRL2-ModelCloud in #2224
Fix CI test not pasing by @Qubitium in #2226
Monkeypatch lm-eval latest broken imports by @Qubitium in #2227
make file can be pytest called by @CSY-ModelCloud in #2228
CI Fix awq weight mean by @LRL2-ModelCloud in #2229
fix pycharm auto imported wrong path by @CSY-ModelCloud in #2230
[FIX] TorchFusedAwqQuantLinear selection by @ZX-ModelCloud in #2233
[CI] update CI path by @CSY-ModelCloud in #2236
[Model] Mistral3 support by @LRL2-ModelCloud in #2238
Update setup.py by @Qubitium in #2240
Increase MAX_JOBS from 4 to 8 in release.yml by @Qubitium in #2241
[FIX] non-peristent buffer was saved incorrectly by @ZX-ModelCloud in #2242

New Contributors

@wayneguow made their first contribution in #2202

@Qubitium

Notable Changes:

Fix double fwd regression by @Qubitium in #2198
Add cli: gptqmodel env by @ZX-ModelCloud in #2192
[CI] compile wheel with python -m build by @CSY-ModelCloud in #2193

What's Changed

Start v5.5.0 devel branch (odd version) by @Qubitium in #2191
Update version from 5.5.0 to 5.4.2 patch release by @Qubitium in #2199
[CI] copy wheel to local dir instead of using http server by @CSY-ModelCloud in #2200

Full Changelog: v5.4.0...v5.4.2

@Qubitium

Notable Changes:

AWQ Torch Fused Kernel by @Qubitium in #2190
Make torch fused op compilable by @jiqing-feng in #2182
[FIX] AWQ MoE by @ZX-ModelCloud in #2171
add :? capture only syntax by @Qubitium in #2173

What's Changed

Update latest news section in README.md by @Qubitium in #2166
run forward pass even for empty subset to produce correct layer outputs by @avtc in #2161
Reduce AWQ memory usage by @Qubitium in #2167
Awq update by @Qubitium in #2168
Retry partial to to fix accelerate invalid argument for first moe layer (reapply) by @avtc in #2169
Awq update by @Qubitium in #2172
adjust retry partial.to by @avtc in #2175
cleanup awq_get_modules_for_scaling() by @ZX-ModelCloud in #2179
[FIX] qwen3 moe sparse moe block by @ZX-ModelCloud in #2184
Add module convert by @LRL2-ModelCloud in #2183
Cleanup by @Qubitium in #2185
Update pypcre version to 0.2.5 by @LRL2-ModelCloud in #2186
Update pypcre version to 0.2.5 by @Qubitium in #2189
[FIX] version("triton") crash on torch+xpu by @ZX-ModelCloud in #2188

Full Changelog: v5.2.0...v5.4.0

@Qubitium

Notable Changes:

Minimax M2, Granite Nano, Qwen3-VL, Brumpy model support
AWQ quantization now out of beta and now fully integrated into life cycle
New VramStrategy.Balanced property to spread MoE modules to different gpus
New pure torch AWQ kernel
New calibration_concat_separator property
Fixed HF bug that did not save mtp layers for GLM 4.5/4.6 (air) models.
Fixed multi-gpu cuda asserts due to stream/sync

What's Changed

try not adding mem guards for marlin kernel launch protection by @Qubitium in https://github.com/ModelCloud/GPTQModel/*pull/2108
MoE vram by @Qubitium in #2110
Fix GLM 4.5/4.6 and AIr not saving mtp layer after save (HF bug) by @LRL2-ModelCloud in #2109
torchao 0.14.1 update by @Qubitium in #2111
Test refractor by @Qubitium in #2113
Bump the github-actions group with 2 updates by @dependabot[bot] in #2120
[FIX] xpu unit test by @ZX-ModelCloud in #2122
modular by @Qubitium in #2123
update scores by @Qubitium in #2124
Fp8 dequant by @Qubitium in #2125
Model dequant by @Qubitium in #2126
Fp4 e2m1 by @Qubitium in #2127
[FIX] ovis2, compatible with transformers v4.57.1 by @ZX-ModelCloud in #2129
fix cols padding by @LRL2-ModelCloud in #2130
[FIX] ovis_1_6 quantization by @ZX-ModelCloud in #2131
Minimax m2 by @Qubitium in #2128
Fix awq marlin kernel for bf16 by @Qubitium in #2135
[FIX] incorrect AWQ NODES by @ZX-ModelCloud in #2133
add support_offload_to_disk check by @LRL2-ModelCloud in #2134
Add Awq torch kernel by @Qubitium in #2137
Marin by @Qubitium in #2139
Marin scores by @Qubitium in #2141
Fix triton version detection in nogil patcher by @amd-vlarakic in #2144
Fix qwen2 omni by @LRL2-ModelCloud in #2140
[MODEL] Add GraniteMoEHybrid by @ZX-ModelCloud in #2142
Fold AWQ into proper Looper/Layer/Subset Lifecycle by @Qubitium in #2138
Refine GPT-QModel description in README by @Qubitium in #2145
fix device_map by @LRL2-ModelCloud in #2146
[MODEL] Add Qwen3-VL by @techshoww in #2136
Add calibration_concat_separator by @Qubitium in #2148
add test_qwen3_vl.py by @LRL2-ModelCloud in #2147
Fix triton monkeypatch by @Qubitium in #2149
[MODEL] Add Brumby by @Qubitium in #2150
Dedup/Cleanup by @Qubitium in #2151
Prep for 5.2 release by @Qubitium in #2152
Dedup3 by @Qubitium in #2153
add missing file by @Qubitium in #2154
GPTAQ rename by @Qubitium in #2155
fix ci test by @Qubitium in #2158
fix setup license by @Qubitium in #2160
FIx snapshot_download receiving unsupported kwargs by @Qubitium in #2162
Retry partial.to to fix accelerate invalid argument error for first moe layer for >4 GPU setups by @avtc in #2163
Comments + Sync by @Qubitium in #2164
Stats/Logs by @Qubitium in #2165

New Contributors

@amd-vlarakic made their first contribution in #2144
@techshoww made their first contribution in #2136

Full Changelog: v5.0.0...v5.2.0

Releases: ModelCloud/GPTQModel

GPT-QModel v5.6.12

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.6.10

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v5.6.8

Notable Changes:

What's Changed

Contributors

Uh oh!

v5.6.6

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.6.4

What's Changed

Contributors

Uh oh!

GPT-QModel v5.6.2

Notable Changes

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v5.6.0

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v5.4.2

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.4.0

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.2.0

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!