CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 135
Releases: withcatai/node-llama-cpp
v3.10.0
59cf309
Compare
3.10.0 (2025-06-12)
Features
- JSON Schema Grammar:
$defs
and$ref
support with full inferred types (#472) (9cdbce9) inspect gguf
command: format and print the Jinja chat template with--key .chatTemplate
(#472) (9cdbce9)
Bug Fixes
JinjaTemplateChatWrapper
: first function call prefix detection (#472) (9cdbce9)QwenChatWrapper
: improve Qwen chat template detection (#472) (9cdbce9)- apply
maxTokens
on function calling parameters (#472) (9cdbce9) - adjust default prompt completion length based on SWA size when relevant (#472) (9cdbce9)
- improve thought segmentation syntax extraction (#472) (9cdbce9)
- adapt to
llama.cpp
changes (#472) (9cdbce9)
Shipped with llama.cpp
release b5640
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
Assets 16
- sha256:eb2ec8edf84ad08ea2ab849af2b862dd8d98b652c74f588301e35e6374e9edd4
2025-06-12T01:28:19Z - sha256:c2cd7864b42e9bbe0cfcce2d95394e6a5a0fac375f3cc4964bc6b52f997b3e14
2025-06-12T01:28:12Z - sha256:31dcb0c3286f27c550b5a187d6b11ad6a8789f8b4c920310e01b434adf22110c
2025-06-12T01:27:59Z - sha256:b2af03be00d51b313042b410c9ac0c410722be558909ba68316194bc9866e717
2025-06-12T01:28:24Z - sha256:c193471667fcf191d8dde98eda1163d416e106d5ca40a46416d6a7969e53ffe3
2025-06-12T01:28:27Z - sha256:6093d0f7b6cd1d222a6d0b8939e8ca042c1d11d8938a382df66d6b9fbe75d736
2025-06-12T01:28:30Z - sha256:9b5d9b7aeb592050ef9befd7700cf10ec3269cb26bde3050f4727b29671586f4
2025-06-12T01:28:04Z - sha256:7c015b960cfc9c23d75db3677699e299f47e34c00087053acf682176edd72018
2025-06-12T01:24:03Z - sha256:070d5fe1f69174c173c80688778a09ed56164d6d1c891036447527dd7e879f76
2025-06-12T01:24:17Z - sha256:0656909eef5ef85f39814d670577c8713adaf936bc36ac6b3a205843ede537f2
2025-06-12T01:24:11Z -
2025-06-11T00:13:49Z -
2025-06-11T00:13:49Z - Loading
v3.9.0
ea8d904
Compare
3.9.0 (2025-06-04)
Features
- reasoning budget (#468) (ea8d904) (documentation: Set Reasoning Budget)
- SWA (Sliding Window Attention) support - greatly reduced context memory consumption on supported models (#468) (ea8d904)
- documentation: LLMs friendly
llms.md
andllms-full.md
files (#468) (ea8d904)
Bug Fixes
Shipped with llama.cpp
release b5590
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
Assets 16
v3.8.1
1799127
Compare
3.8.1 (2025-05-19)
Bug Fixes
getLlamaGpuTypes
: edge case (#463) (1799127)- remove prompt completion from the cached context window (#463) (1799127)
Shipped with llama.cpp
release b5415
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
Assets 16
v3.8.0
f2cb873
Compare
3.8.0 (2025-05-17)
Features
- save and restore a context sequence state (#460) (f2cb873) (documentation: Saving and restoring a context sequence evaluation state)
- stream function call parameters (#460) (f2cb873) (documentation: API:
LLamaChatPromptOptions["onFunctionCallParamsChunk"]
) - configure Hugging Face remote endpoint for resolving URIs (#460) (f2cb873) (documentation: API:
ResolveModelFileOptions["endpoints"]
) - Qwen 3 support (#460) (f2cb873)
QwenChatWrapper
: support discouraging the generation of thoughts (#460) (f2cb873) (documentation: API:QwenChatWrapper
constructor >thoughts
option)getLlama
:dryRun
option (#460) (f2cb873) (documentation: API:LlamaOptions["dryRun"]
)getLlamaGpuTypes
function (#460) (f2cb873) (documentation: API:getLlamaGpuTypes
)
Bug Fixes
- adapt to breaking
llama.cpp
changes (#460) (f2cb873) - capture multi-token segment separators (#460) (f2cb873)
- race condition when reading extremely long gguf metadata (#460) (f2cb873)
- adapt memory estimation to newly added model architectures (#460) (f2cb873)
- skip binary testing on certain problematic conditions (#460) (f2cb873)
- improve GPU backend loading error description (#460) (f2cb873)
Shipped with llama.cpp
release b5414
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
Assets 16
v3.7.0
c070e81
Compare
3.7.0 (2025-03-28)
Features
- extract function calling syntax from a Jinja template (#444) (c070e81)
- Full support for Qwen and QwQ via
QwenChatWrapper
(#444) (c070e81) - export a
llama
instance getter on a model instance (#444) (c070e81)
Bug Fixes
- better handling for function calling with empty parameters (#444) (c070e81)
- reranking edge case crash (#444) (c070e81)
- limit the context size by default in the node-typescript template (#444) (c070e81)
- adapt to breaking
llama.cpp
changes (#444) (c070e81) - bump min nodejs version to 20 due to dependencies' requirements (#444) (c070e81)
defineChatSessionFunction
type (#444) (c070e81)
Shipped with llama.cpp
release b4980
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
Assets 16
v3.6.0
599a161
Compare
✨ DeepSeek R1 is here! ✨
Read about the release in the blog post
3.6.0 (2025-02-21)
Features
- DeepSeek R1 support (#428) (ca6b901) (documentation: DeepSeek R1)
- chain of thought segmentation (#428) (ca6b901) (documentation: Stream Response Segments)
- pass a model to
resolveChatWrapper
(#428) (ca6b901) defineChatSessionFunction
: improveparams
type (#428) (ca6b901)- Electron template: show chain of thought (#428) (ca6b901) (documentation: DeepSeek R1)
- Electron template: add functions template (#428) (ca6b901)
- Electron template: new icon for the CI build (#428) (ca6b901)
- Electron template: update model message in a more stable manner (#428) (ca6b901)
- Electron template: more convenient completion (#428) (ca6b901)
Bug Fixes
- check path existence before reading its content (#428) (ca6b901)
- partial tokens handling (#428) (ca6b901)
- uncaught exception (#430) (599a161)
- Electron template: non-latin text formatting (#430) (599a161)
Shipped with llama.cpp
release b4753
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
Assets 16
v3.5.0
63a1066
Compare
3.5.0 (2025-01-31)
Features
- shorter model URIs (#421) (73454d9) (documentation: Model URIs)
Bug Fixes
Shipped with llama.cpp
release b4600
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
Assets 16
v3.4.3
6e4bf3d
Compare
3.4.3 (2025-01-30)
Bug Fixes
Shipped with llama.cpp
release b4599
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
Assets 16
v3.4.2
314d7e8
Compare
3.4.2 (2025-01-27)
Bug Fixes
- metadata string encoding (#420) (314d7e8)
- Vulkan parallel decoding (#420) (314d7e8)
- try auth token on 401 response (#420) (314d7e8)
Shipped with llama.cpp
release b4567
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
Assets 16
v3.4.1
86e1bee