CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 243
Releases: microsoft/Olive
Olive-ai 0.9.3
Compare
New Features:
- Compatibility with Windows ML for ONNX model inference and evaluation (#2052, #2056, #2059, #2084).
Gptq
quantization supportslm_head
quantization and more generic weight packing (#2137).
Improvements
optimize
CLI supportsWebGPU
execution provider (#2076) andNVTensorRtRTX
execution provider (#2078).quantize
CLI supports Gptq pass as an implementation (#2115).Onnx static quantization
supports strided calibration data for lower memory usage (#2086).- Extra options can be provided directedly to the
ModelBuilder
pass (#2107). LMEvaluator
has a new ORT backend withIOBinding
leading to large speedup in runtime (#2133).OnnxFloatToFloat16
allows more granular control throughop_include_list
andnode_include_list
(#2134).AIMET
quantization pass: Support for exclude op types (#2055), pre-quantized models (#2111), LLM augmented dataloaders (#2108), LPBQ (#2119), and Adaround (#2140).
Deprecation
As per the deprecation warning in the previous release, the following Azure ML
related features have been removed:
- Azure ML system
- Azure ML resource types: model, datastore, job outputs.
- Remote workflow
- Azure ML artifact packaging
Other removed features include:
IsolatedORT System
(#2070)Quantization Aware Training
(#2089)AppendPrePostProcessingOps
pass (#2090)SNPE
passes (#2098)
Recipes Migration
All recipes have been migrated to olive-recipes repository.
Assets 3
Olive-ai 0.9.2
Compare
New Features:
- Selective Mixed Precision. (#1898)
- Native GPTQ Implementation with support for Selective Mixed Precision. (#1949)
- Blockwise RTN Quantization for ONNX models. (#1899)
- Ability to add custom metadata in ONNX model. (#1900)
- New simplified
olive optimize
CLI command and theolive.quantize()
Python API for effortless model optimization with minimal developer input. See CLI usage and Python API docs for more details. (#1996) - New command line
olive run-pass
provides advanced users ability to run individual passes. (#1904)
New Integrations
- GPTQModel. (#1999)
- AIMET (#2028). This is a work in progress.
- ONNX model support while targeting OpenVINO. (#2019)
QuarkQuantization
: AMD Quark quantization for LLMs. (#2010)VitisGenerateModelLLM
for optimized LLM model generation for Vitis AI Execution Provider. (#2010)
Improvements
- New graph surgeries including
dla transformers
,DecomposeRotaryEmbedding
andDecomposeQuickGelu
. (#2018, #1972, #2000) - Exposed
WorkflowOutput
in Python API and added unified APIs for CLI commands. (#1907) - Refactored Docker system for simplified setup and execution. (#1990)
- ExtractAdapters:
- Added support for DORA and LoHA adapters. (#1611)
- NVMO quantization:
- OnnxPeepholeOptimizer:
- Removed
fuse_transpose_qat
andpatch_unsupported_argmax_operator
. (#1976)
- Removed
Deprecation
Azure ML will be deprecated in the next release, including:
- Azure ML system
- Azure ML workspace model
- Remote workflow
Recipes Migration
All recipes are being migrated to the olive-recipes repository. New recipes will be added and maintained there going forward.
Assets 3
Olive-ai 0.9.1
Compare
Minor release to fix following issues
- OpenVINO Encapsulation pad_token_id fix (#1847)
- Add support for Nvidia TensorRT RTX execution provider in Olive (#1852)
- Basic support for ONNX auto EP selection introduced in onnxruntime v1.22.0 (#1854, #1863)
- Add Nvidia TensorRT-RTX Olive recipe for vit, clip and bert examples (#1858)
- gate optimum[openvino] version to <=1.24 (#1864)
Assets 3
Olive-ai 0.9.0
Compare
Feature Updates
- Implement lm-eval-harness based LLM quality evaluator for ONNX GenAI models #1720
- Update minimum supported target opset for ONNX to 17. #1741
- QDQ support for ModelBuilder pass #1736
- Refactor OnnxOpVersionConversion to conditionally use onnxscript version converter #1784
- HQQ Quantizer Pass #1799, #1835
- Introducing global definitions for Precision & PrecisionBits #1808
- Improvements in PeepholeHoleOptimizer #1697, #1698
New Passes
- OnnxScriptFusion: ONNX script fusion
- OpenVINOEncapsulation, OpenVINOReshape, OpenVINOIoUpdate: OpenVINO encapsulation #1754
- TrtMatMulToConvTransform: Convert non-4D MatMul to Transpose-Conv-Transpose sequence
- OpenVINOOptimumConversion: Add optimum Intel® pass for converting a Huggingface Model to an OpenVINO Model
- Graph Surgeries
- MatMulAddGemm: Graph surgery to transform Add Op followed by Matmul as Gemm op
- PowReduceSumPowDiv2LpNorm: Graph surgery to merge Pow ReduceSum Pow Div pattern to L2Norm
- OnnxHqqQuantization: Implements 4-bit HQQ quantization
- VitisAIAddMetaData: Adds metadata to an ONNX model based on specified model attributes.
New/Updated Examples
- Alibaba-NLP/gte #1695
- DeepSeek
- OpenVINO #1786
- Google BERT
- Google VIT
- Intel BERT
- Laion Clip
- Llama3
- OpenVINO #1786
- Meta Llama3
- QDQ #1707
- OpenAI Clip (16 and 32)
- Phi3.5
- Phi4
- OpenVINO #1828
- Qwen
- Resnet50
- Sentence Transformers CLIP
- Stable Diffusion
- QDQ #1730
Deprecated Examples
Deprecated Passes
- InsertBeamSearchOp #1805
Assets 3
Olive-ai 0.8.0
Compare
New Features (Passes)
QuaRot
performs offline weight rotationSpinQuant
performs offline weight rotationStaticLLM
converts dynamic shaped llm into a static shaped llm for NPUs.GraphSurgeries
applies surgeries to ONNX model. Surgeries are modular and individually configurable.LoHa
,LoKr
andDoRA
finetuningOnnxQuantizationPreprocess
applies quantization preprocessing.EPContextBinaryGenerator
creates EP specific context binary onnx models.ComposeOnnxModels
composes split onnx models.OnnxIOFloat16ToFloat32
replaced with more genericOnnxIODataTypeConverter
Command Line Interface
New command line tools have been added and existing tools have been improved.
generate_config_file
option to save the workflow config file.extract-adapters
command to extract multiple adapters from a PyTorch model.- Simplied
quantize
command
Improvements
- Better output model structure for workflow and CLI runs.
- New
no_artifacts
options in workflow config to disable saving run artifacts such as footprints.
- New
- Hf data preprocessing:
- Dataset is truncated if
max_samples
is set. - Empty text are filtered.
padding_side
is configurable and defaults to"right"
.
- Dataset is truncated if
SplitModel
pass keeps QDQ nodes together in the same split.OnnxPeepholeOptimizer
: constant folding + onnxoptimizer added.CaptureSplitInfo
: Separate split for memory intensive module.OnnxConversion
:- Dynamic shapes for dynamo export.
optimize
option to perform constant folding and redundancies elimination on dynamo exported model.
GPTQ
: Default wikitest calibration dataset. Patch to support newer versions oftransformers
.MatMulNBitsToQDQ
:nodes_to_exclude
option.SplitModel
:split_assignments
option to provide custom split assignments.CaptureSplitInfo
:block_to_split
can be a single block (str) or multiple blocks (list).OnnxMatMul4Quantizer
: Support onnxruntime 1.18+OnnxQuantization
:- Support onnxruntime 1.18+.
op_types_to_exclude
option.LLMAugmentedDataLoader
augments the calibration data for llms with kv cache and other missing inputs.
- New document theme and organization.
- Reimplement search logic to include passes in search space.
Examples:
- New QNN EP examples:
- SLMs:
- Phi-3.5
- Deepseek R1 Distill
- Llama 3.2
- MobileNet
- ResNet
- CLIP VIT
- BAAI/bge-small-en-v1.5
- Table Transformer Detection
- adetailer
- SLMs:
- Deepseek R1 Distill Finetuning
timm
MobileNet
Assets 3
Olive-ai 0.7.1.1
Compare
Same as 0.7.1 with updated dependencies for nvmo
extra and NVIDIA TensorRT Model Optimizer example doc.
Refer 0.7.1 Release Notes for other details.
Assets 3
Olive-ai 0.7.1
Compare
Command Line Interface
New command line tools have been added and existing tools have been improved.
olive --help
works as expected.auto-opt
:- The command chooses a set of passes compatible with the provided model type, precision and accelerator information.
- New options to split a model, either using
--num-splits
or--cost-model
.
Improvements
ExtractAdapters
:- Support lora adapter nodes in Stable Diffusion unet or text-embedding models.
- Default initializers for quantized adapter to run the model without adapter inputs.
GPTQ
:- Avoid saving unused bias weights (all zeros).
- Set
use_exllama
toFalse
by default to allow exporting and fine-tuning external GPTQ checkpoints.
AWQ
: Patch autoawq to run quantization on newer transformers versions.- Atomic
SharedCache
operations - New
CaptureSplitInfo
andSplit
passes to split models into components. Number of splits can be user provided or inferred from a cost model. disable_search
is deprecated from pass configuration in an olive workflow config.OrtSessionParamsTuning
redone to use olive search features.OrtModelOptimizer
renamed toOrtPeepholeOptimizer
and some bug fixes.
Examples:
- Stable Diffusion: New MultiLora Example
- Phi3: New int quantization example using
nvidia-modelopt
Assets 3
Olive-ai 0.7.0
Compare
Command Line Interface (CLI)
Introducing new command line interface for Olive with support to execute well-defined concrete workflows without user having to ever create or edit a config manually. CLI workflow commands can be chained i.e. output of one execution can be fed as input to the next, to facilitate ease of operations for the entire pipeline. Below is a list of few CLI workflow commands -
- finetune - Fine-tune a model on a dataset using peft and optimize the model for ONNX Runtime
- capture-onnx-graph: Capture ONNX graph for a Huggingface model.
- auto-opt: Automatically optimize a model for performance.
- quantize: Quantize model using given algorithm for desired precision and target.
- tune-session-params: Automatically tune the session parameters for a ONNX model.
- generate-adapter: Generate ONNX model with adapters as inputs.
Improvements
- Added support for yaml based workflow config
- Streamlined DataConfig management
- Simplified workflow configuration
- Added shared cache support for intermediate models and supporting data files
- Added QuaRoT quantization pass for PyTorch models
- Added support to evaluate generative PyTorch models
- Streamlined support for user-defined evaluators
- Enabled use of llm-evaluation-harness for generative model evaluations
Examples
- Llama
- Updated multi-lora example to use ORT genreate() API
- Updated to demonstrate use of shared cache
- Phi3
- Updated to demonstrate evaluation using lm-eval harness
- Updated to showcase search across three different QLoRA ranks
- Added Vision tutorial
Assets 3
Olive-ai 0.6.2
Compare
Workflow config
- Support YAML files as workflow config file. #1191
- Workflow id feature is a prerequisite for running workflow on a remote vm feature. By adding this feature #1179 :
- Cache dir will become
<cache_dir>/<workflow_id>
- OLive config will be automatically saved to cache dir.
- User can specify
workflow_id
in config file. - The default workflow_id is
default_workflow
.
- Cache dir will become
Passes (optimization techniques)
- Accept SNPE DLC model for qnn context binnary generator #1188
Data
- Remove params_config, components/component_args. All components specific parameters are now grouped in four separate objects: #1187
- load_dataset_config
- pre_process_data_config
- post_process_data_config
- dataloader_config
Docs
- Add olive workflow schema to doc website. This schema file can be used in IDEs when writing workflow configs. #1190