CARVIEW |
Select Language
HTTP/2 200
date: Thu, 24 Jul 2025 22:15:05 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"1f0491c676f93d0e81115357089ba94d"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=WoFbMkrezQJ8J8dRrlV9%2BUbM%2BHoIsLkZIAdihY02PL5zi%2Byv2D1%2FE7zlXI4O9EMyX%2FKTiECUQb9UnNDhfxhGrn%2B%2FhrJ8NW%2FGoA9Ibk82F2mg7rCIjw52fj%2FtuUJABPq6PFLinw%2FnoTUEA65uQHZwe0UXaX5f3ZR5YDvcHr%2F0Ay%2Bg3IfzFauN%2B4ycsqZJ3c6KZTXqiR7njzdaMa%2BOSeGG1S8sCl%2BkfxEFL7gDRC%2B%2BldK8YLIX2OvVhr78USPz1IdES7EVFJcx%2FxMAbob2VFwh5g%3D%3D--dGXf5pUk4f5g%2F8WQ--%2FwE2pxLiIKXXHla%2B4ynokg%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.615218784.1753395304; Path=/; Domain=github.com; Expires=Fri, 24 Jul 2026 22:15:04 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Fri, 24 Jul 2026 22:15:04 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: A20C:CD99E:14C2A:1F310:6882B068
Releases · microsoft/onnxruntime-genai · GitHub
03 Jul 20:37
05 Jun 23:03
Loading
30 May 22:14
Loading
30 May 22:12
Loading
22 Apr 02:20
Loading
28 Mar 16:58
Loading
14 Feb 18:07
Loading
26 Nov 18:05
Loading
13 Nov 21:26
Loading
08 Nov 19:43
Loading
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 200
Releases: microsoft/onnxruntime-genai
Releases · microsoft/onnxruntime-genai
v0.8.3
dc2d850
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
Assets 13
- sha256:29787dca4468c7c5992909ac80ffeef2a43a86a2557afe5a93f28cb07202e7a1
2025-07-03T20:35:53Z - sha256:51b066f86560628e33bc4087a0877412a547d49afa310b0963093a287809d725
2025-07-03T20:35:55Z - sha256:914c4fe4bf493f23b04f27a472d81165f897cc6412d23a838b4c8da5c71b317b
2025-07-03T20:35:55Z - sha256:9b9188f51d1d81de681375e8acdb5d7a0b710e17b33a8f4d0e0b5d8d441313e1
2025-07-03T20:35:53Z - sha256:8a7835e5764b6e4c4c1bf08f4467ae8bc8374c05befc6e1a0acf93f73894a297
2025-07-03T20:35:50Z - sha256:d23c4b923243538aac7726ae0c59d53fb45463909e5cc4b947a736a4bc4d5ef7
2025-07-03T20:35:50Z - sha256:29210f0405a4dbd542acb66f041a6691364add3cb6cb9b1d1c7f4f5e8c9daaf4
2025-07-03T20:35:51Z - sha256:240c4839835c55c7e56424b9385e7f005688882b13f45a132d2463308613e153
2025-07-03T20:35:52Z - sha256:9834abbf4c3f0748c14d0cb4206d5d879a414c9d77ae814780645abb81396cda
2025-07-03T20:35:51Z - sha256:c7a23186c989dab539e2ee22ebcf5c17a51657af1620d37e52e2fa4e9932958a
2025-07-03T20:35:56Z -
2025-06-27T23:39:58Z -
2025-06-27T23:39:58Z - Loading
v0.8.2
fea4e96
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
What's changed
New features
- Use Accuracy level 4 for webgpu by default by @guschmue (#1474)
- Enable guidance by default on macos by @ajindal1 (#1514)
Bug fixes
- Remove position_id and fix context phase KV shapes for in-place cache buffer support by @anujj (#1505)
- Update Extensions Commit for 0.8.2 by @sayanshaw24 (#1519)
- Update Extensions Commit for another DeepSeek Fix by @sayanshaw24 (#1521)
Packaging and testing
Full Changelog: v0.8.1...v0.8.2
Assets 13
v0.8.1
caba648
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
What's changed
New features
- Integrate tools input into Chat Template API by @sayanshaw24 (#1472)
- NvTensorRtRtx EP option in GenAI - model builder by @BLSharda (#1453)
- Enable TRT multi profile option though provider option by @anujj (#1493)
Bug fixes
- Always cast bf16 logits to fp32 by @nenad1002 (#1479)
Examples and documentation
- Update Chat Template Examples for Tools API change by @sayanshaw24 (#1506)
- Fix model chat example for rewind by @ajindal1 (#1480)
Model builder changes
- Fix from pretrained method for quantized models by @kunal-vaishnavi (#1503)
- Fix missing parameter name by @xadupre (#1502)
- minor change to support qwen3 by @guschmue (#1499)
- Fix how torch tensors are saved by @kunal-vaishnavi (#1476)
- Support k_quant in model builder by @jiafatom (#1444)
Dependency updates
- Update to stable release of Microsoft.Extensions.AI.Abstractions by @stephentoub (#1489)
- Update to M.E.AI 9.4.3-preview.1.25230.7 by @stephentoub (#1443)
Full Changelog: v0.8.0...v0.8.1
Assets 13
v0.8.0
Compare
What's Changed
New Features
- Add Chat Template API Changes by @sayanshaw24 in #1398
- Add Python and C# bindings for Chat Template API by @sayanshaw24 in #1411
- Support for gemma3 model by @baijumeswani in #1374
- Support more QNN models with different model structures by @baijumeswani in #1322
- Add ability to load audio from bytes, to match images API by @RyanUnderhill in #1304
- Add support for DML Graph Capture to improve speed by @aciddelgado in #1305
- Added OnnxRuntimeGenAIChatClient ctor with Config. by @azchohfi in #1364
- Extensible AppendExecutionProvider and expose OrtSessionOptions::AddConfigEntry directly by @RyanUnderhill in #1384
- OpenVINO: Model Managed KVCache by @RyanMetcalfeInt8 in #1399
- Changes how the device OrtAllocators work, use a global OrtSession instead by @RyanUnderhill in #1378
- Remove audio attention mask processing and update ort-extensions by @baijumeswani in #1319
- Simplify the C API definitions and prevent any type mismatches going forward by @RyanUnderhill in #1365
Model builder updates
- Quark Quantizer Support by @shobrienDMA in #1207
- Add Gemma 3 to model builder by @kunal-vaishnavi in #1359
- Initial support for VitisAI EP by @AnanyaA-9 in #1370
- [OVEP] feat: Adding OpenVINO EP in ORT-GenAI by @ankitm3k in #1389
- Initial support for NV EP by @BLSharda in #1404
- Adapt to MatMulNBitsQuantizer in ort by @jiafatom in #1426
- Fix LM head for Gemma-2 by @kunal-vaishnavi in #1420
Bug Fixes
- Fix mismatch in Java bindings by @CaptainIRS in #1307
- Fix type mismatch in Java bindings by @CaptainIRS in #1313
- Update ort-extensions to fix tokenizer bug for phi4 by @baijumeswani in #1331
- Windows: Show more useful DLL load errors to say exactly what DLL is missing by @RyanUnderhill in #1345
- deprecate graph cap by @aciddelgado in #1338
- Support load/unload of models to avoid QNN errors on deepseek r1 1.5B by @baijumeswani in #1346
- Add missing 'value_stats' to logging API, and fix wrong default by @RyanUnderhill in #1353
- Convert tokens to list for concat by @ajindal1 in #1358
- Improve and Fix TopKTopP by @jiafatom in #1363
- Switch the order of softmax on CPU Top K by @aciddelgado in #1354
- Update pybind and fix rpath for macos and check for nullptr by @baijumeswani in #1367
- iterate over the providers by @baijumeswani in #1486
- Correctly iterate over the providers to check if graph capture is enabled by @baijumeswani in #1487
Examples and Documentation
- Update README.md by @RyanUnderhill in #1372
- Add slm engine example by @avijit-chakroborty in #1242
- Added cancellation to the streaming method of OnnxRuntimeGenAIChatClient. by @azchohfi in #1289
- Update nuget README with latest API by @natke in #1326
- Update C examples downloads by @ajindal1 in #1332
- Add Q&A Test Example in Nightly by @ajindal1 in #1277
- docs: update the doc of slm_engine to ensure consistency with the code by @dennis2030 in #1386
- C++ and python samples: follow_config support by @RyanMetcalfeInt8 in #1413
- Fix Do Sample example by @ajindal1 in #1337
- Make phi3 example Q&A rather than chat by @ajindal1 in #1392
- Fix broken link in package description by @rogerbarreto in #1360
Packaging and Testing
- Remove DirectML.dll dependency by @baijumeswani in #1342
- Add support to creating a custom nuget in the packaging pipeline by @baijumeswani in #1315
- Remove onnxruntime-genai-static library (non trivial change) by @RyanUnderhill in #1264
- Add macosx to custom nuget package by @baijumeswani in #1419
- Update the C++ clang-format lint workflow to use clang 20 by @snnn in #1418
- Add model_benchmark options to specify prompt to use. by @edgchen1 in #1328
- Add value_stats logging option to show statistical information about … by @RyanUnderhill in #1352
- Fixed the MacOS build and updated the test script. by @avijit-chakroborty in #1310
- Fix iOS packaging pipeline after static library removal by @RyanUnderhill in #1316
- fix bug in python benchmark script by @thevishalagarwal in #1206
- Fix macos package by @baijumeswani in #1347
- Missing *.dylib in package_data, so Mac would not package our shared libraries by @RyanUnderhill in #1341
Dependency Updates
- Update upload Artifact version by @ajindal1 in #1274
- Update to M.E.AI 9.3.0-preview.1.25161.3 by @stephentoub in #1317
- Update android min sdk version to 24 by @baijumeswani in #1324
- Update torch to 2.5.1 by @baijumeswani in #1343
- Update Pipelines for S360 by @ajindal1 in #1323
- Update Nuget pkg name by @ajindal1 in #1351
- update version to 0.8.0 by @baijumeswani in #1376
- Update custom nuget packaging logic by @baijumeswani in #1377
- Update Microsoft.Extensions.AI.Abstractions to 9.4.0-preview.1.25207.5 by @stephentoub in #1388
- Bump torch from 2.5.1 to 2.6.0 in /test/python/macos/torch by @dependabot in #1408
- Bump torch from 2.5.1+cu124 to 2.6.0+cu124 in /test/python/cuda/torch by @dependabot in #1409
- Bump torch from 2.5.1+cpu to 2.7.0 in /test/python/cpu/torch by @dependabot in #1422
- pin cmake version by @snnn in #1424
New Contributors
- @avijit-chakroborty made their first contribution in #1242
- @CaptainIRS made their first contribution in #1307
- @AnanyaA-9 made their first contribution in #1370
- @dennis2030 made their first contribution in #1386
- @ankitm3k made their first contribution in #1389
- @RyanMetcalfeInt8 made their first contribution in #1399
Full Changelog: v0.7.1...v0.8.0
Assets 13
v0.7.1
efab081
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
Release Notes
- Add AMD Quark Quantizer Support #1207
- Added Gemma 3 to model builder #1359
- Updated Phi-3 Python Q&A example to be consistent with C++ example #1392
- Updated Microsoft.Extensions.AI.Abstractions to 9.4.0-preview.1.25207.5 #1388
- Added OnnxRuntimeGenAIChatClient constructor with Config #1364
- Improve and Fix TopKTopP #1363
- Switch the order of softmax on CPU Top K #1354
- Updated custom nuget packaging logic #1377
- Updated pybind and fix rpath for macos and check for nullptr #1367
- Convert tokens to list for concat to accommodate breaking API change in tokenizer #1358
Assets 13
v0.7.0
8a48d7b
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
Release Notes
We are excited to announce the release of onnxruntime-genai
version 0.7.0. Below are the key updates included in this release:
- Support for a wider variety of QNN NPU models (such as Deepseek R1)
- Remove
onnxruntime-genai
static library. All language bindings now interface withonnxruntime-genai
through theonnxruntime-genai
shared library.- All return types from
onnxruntime-genai
python package is now a numpy array type. - Previously the return type from tokenizer.encode was a python list. This broke examples/python/model-qa.py which was using '+' to concatenate two lists. np.concatenate must be used instead for these cases.
- All return types from
- Abstract away execution provider specific code into shared libraries of their own (for example
onnxruntime-genai-cuda
for cuda, andonnxruntime-genai-dml
for dml). This allows using the onnxruntime-genai-cuda package to also work on non cuda machines (as an example). - Support for multi-modal models (text, speech, and vision) such as phi4-multi-modal.
- Add an IChatClient implementation to the
onnxruntime-genai
C# bindings. - Expose the model type through the Python bindings.
- Code and performance improvements for DML EP.
This release also includes several bug fixes that resolve issues reported by users.
Assets 13
v0.6.0
97d44f6
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
Release Notes
We are excited to announce the release of onnxruntime-genai
version 0.6.0. Below are the key updates included in this release:
- Support for contextual or continuous decoding allows users to carry out multi-turn conversation style generation.
- Support for new models such as Deepseek R1, AMD OLMo, IBM Granite and others.
- Python 3.13 wheels have been introduced
- Support for generation for models sourced from Qualcomm's AI Hub. This work also includes publishing a nuget package
Microsoft.ML.OnnxRuntimeGenAI.QNN
for QNN EP - Support for WebGPU EP
This release also includes performance improvements to optimize memory usage and speed. In addition, there are several bug fixes that resolve issues reported by users.
Assets 13
5 people reacted
v0.5.2
27bcf6c
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
Release Notes
Patch release 0.5.2 adds:
- Fixes for bugs #1074, #1092 via PRs #1065 and #1070
- Fix Nuget sample in package README to show correct disposal of objects
- Added extra validation via PRs #1050 #1066
Features in 0.5.0:
- Support for MultiLoRA
- Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
- Support for the Phi-3 MoE model
- Support for NVIDIA Nemotron model
- Support for the Qwen model
- Addition of the Set Terminate feature, which allows users to cancel mid-generation
- Soft capping support for Group Query Attention
- Extend quantization support to embedding and LM head layers
- Mac support in published packages
Known issues
- Models running with DirectML do not support batching
- Python 3.13 is not supported in this release
Assets 11
1 person reacted
v0.5.1
e8cd6bc
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
Release Notes
In addition to the features in the 0.5.0 release, this release adds:
- Add ability to choose provider and modify options at runtime
- Fixed data leakage bug with KV caches
Features in 0.5.0:
- Support for MultiLoRA
- Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
- Support for the Phi-3 MoE model
- Support for NVIDIA Nemotron model
- Support for the Qwen model
- Addition of the Set Terminate feature, which allows users to cancel mid-generation
- Soft capping support for Group Query Attention
- Extend quantization support to embedding and LM head layers
- Mac support in published packages
Known issues
- Models running with DirectML do not support batching
- Python 3.13 is not supported in this release
Assets 11
v0.5.0
826f6aa
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Compare
Release Notes
- Support for MultiLoRA
- Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
- Support for the Phi-3 MoE model
- Support for NVIDIA Nemotron model
- Support for the Qwen model
- Addition of the Set Terminate feature, which allows users to cancel mid-generation
- Soft capping support for Group Query Attention
- Extend quantization support to embedding and LM head layers
- Mac support in published packages
Known issues
- Models running with DirectML do not support batching
- Python 3.13 is not supported in this release
Assets 11
Previous Next
You can’t perform that action at this time.