Carview!

2025.05.09 发布PaddleMIX 3.0.0-beta

多模态理解

新增模型：Qwen2VL/Qwen2.5VL系列，DeepSeek-VL2, miniCPM-V 2.6, Janus系列，LLaVA-Critic, LLaVA-DenseConnector, LLaVA-OneVision, GOT-OCR2.0, mPLUG-Owl3
PP系列模型：发布自研PP-DocBee文档理解多模态大模型，在学术界权威的英文文档理解评测榜单上达到同参数量级别模型SOTA
工具链升级：完善高性能推理部署，新增支持Qwen2.5VL系列，A800推理性能较vllm领先11.5%。LLaVA、InternVL2模型训练和推理适配昇腾910B

多模态生成

新增模型：Open-MAGVIT2，文生视频模型CogVideoX, HunyuanVideo
PP系列模型：发布自研可控视频模型PP-VCtrl，支持在多种控制条件下的视频生成
工具链升级：发布ppdiffusers 0.29.1版本，新增对SD3 ControlNet和SD3.5的支持。SD3高性能推理性能打平TensorRT。SD3、SDXL模型LoRA训练和推理适配昇腾910B

@xiaoguoguo626807

更新内容

发布自研多模数据能力标签模型PP-InsCapTagger；可用于数据的分析和过滤，试验案例表明在保持模型效果的条件下可减少50%的数据量，大幅提高训练效率。
新增Qwen2-VL、InternVL2、Stable Diffusion 3 (SD3)等前沿模型。
多模态大模型InternVL2、LLaVA、SD3、SDXL适配昇腾910B，提供国产计算芯片上的训推能力。

What's Changed

【pir 】modify dy2static Sd and 3. Grounding DINO model by @xiaoguoguo626807 in #689
fix llava pretrain config by @pkhk-1 in #685
Re-network the DIT, fix some parameters, and simplify the model networking code by @chang-wenbin in #632
update DIT doc by @chang-wenbin in #693
[NPU] Add llava npu doc by @Birdylx in #694
sd3推理优化——避免同步 by @chang-wenbin in #695
减少重复拷贝，修复BUG by @chang-wenbin in #699
Add Qwen2-VL infer codes by @nemonameless in #698
[doc] Update requirements by @nemonameless in #703
Llava bug by @LokeZhou in #704
Fix is inference mode by @zhoutianzi666 in #711
update readme by @lyuwenyu in #705
update opensora video save method by @westfish in #712
Limit the installed version of paddlenlp and fix bugs of llava-next. by @luyao-cv in #716
SD3 transformer部分的优化 by @zhoutianzi666 in #713
[wip] add mix scheme by @lyuwenyu in #664
[NPU] InternVL2 supports npu training by @Birdylx in #714
Add SD3 DreamBooth by @westfish in #686
remove phi3 in internvl2 and refine format by @nemonameless in #715
add flash_atten for qw2vl by @luyao-cv in #723
[NPU] sdxl support NPU training by @wangna11BD in #719
[NPU] sdxl lora support NPU training by @warrentdrew in #718
Adapt fa for npu by @LielinJiang in #706
[NPU] fix readme doc for SDXL LoRA training by @warrentdrew in #724
[npu]sd3 dreambooth adapt for npu by @LielinJiang in #726
add pp-inscaptagger by @pkhk-1 in #727
ADD SD3 batch_parallel by @chang-wenbin in #731
support auto parallel in dit and largedit by @jeff41404 in #551
add env_run.sh and correct packages version by @luyao-cv in #733
[NPU] Fix typo by @Birdylx in #696
paddlemix v2.1 readme by @lyuwenyu in #734
修复paddlenlp develop版本适配错误_10-11 by @Xiaobin-Lu in #735
修复qwen2vl视频图像预处理 by @luyao-cv in #737
[wip] update v2.1 readme by @lyuwenyu in #736
fix internvl2 minimonkey dataset docs by @nemonameless in #741
fix tests of evaclip and internvl2 by @nemonameless in #746
image2text_generation rm use_fast by @LokeZhou in #744
fix readme for llava_next_interleave by @luyao-cv in #748
support Qwen2-VL sft training by @nemonameless in #739
fix dit training by @nemonameless in #752
fix tests by @nemonameless in #753
remove use_fast in AutoTokenizer by @warrentdrew in #747
fix dit weights convert to ppdiffusers by @nemonameless in #759
[PPDiffusers]fix bugs and release 0.29.0 by @westfish in #742
autolabel fix nltk download by @LokeZhou in #763
[NPU] fix npu llava infer by @Birdylx in #757
Add npu model list by @nepeplwu in #758
Fix docs of by @nemonameless in #767
merge upstream readme by @luyao-cv in #766
correct huggingface_hub version by @luyao-cv in #771
[NPU] Refine doc by @Birdylx in #774

New Contributors

@xiaoguoguo626807 made their first contribution in #689
@chang-wenbin made their first contribution in #632
@wangna11BD made their first contribution in #719
@LielinJiang made their first contribution in #706
@jeff41404 made their first contribution in #551
@Xiaobin-Lu made their first contribution in #735
@nepeplwu made their first contribution in #758

Full Changelog: https://github.com/PaddlePaddle/PaddleMIX/commits/v2.1.0

多模态理解

新增模型：LLaVA: v1.5-7b, v1.5-13b, v1,6-7b，CogAgent, CogVLM, Qwen-VL, InternLM-XComposer2
数据集增强：新增chatml_dataset图文对话数据读取方案，可自定义chat_template文件适配，支持混合数据集
工具链升级：新增Auto模块，统一SFT训练流程，兼容全参数、lora训练。新增mixtoken训练策略，SFT吞吐量提升5.6倍。支持Qwen-VL，LLaVA推理部署，较torch推理性能提升2.38倍

多模态生成

视频生成能力：支持Sora相关技术，支持DiT、SiT、UViT训练推理，新增NaViT、MAGVIT-v2模型；新增视频生成模型SVD、Open Sora，支持模型微调和推理；新增姿态可控视频生成模型AnimateAnyone、即插即用视频生成模型AnimateDiff、GIF视频生成模型Hotshot-XL；
文生图模型库：新增高速推理文图生成模型LCM，适配SD/SDXL训练和推理；
工具链升级：发布ppdiffusers 0.24.1版本，新增peft，accelerate后端；权重加载/保存全面升级，支持分布式、模型切片、safetensors等场景。
生态兼容：提供基于ppdiffusers开发的ComfyUI插件，支持了常见的模型加载转换、文生图、图生图、图像局部修改等任务。新增Stable Diffusion 1.5系列节点；新增Stable Diffusion XL系列节点。新增4个图像生成的workflow案例。

DataCopilot（多模态数据处理工具箱）

多模态数据集类型MMDataset，支持加载和导出Json、H5、Jsonl等多种数据存储格式，内置并发（map, filter）数据处理接口等
多模态数据格式工具，支持自定义数据结构，数据转换，离线格式检查
多模态数据分析工具，支持基本的统计信息，数据可视化功能，以及注册自定义功能

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

更新内容

What's Changed

New Contributors

Contributors

Uh oh!

多模态理解

多模态生成

DataCopilot（多模态数据处理工具箱）

Uh oh!

Releases: PaddlePaddle/PaddleMIX

v3.0.0-beta

Uh oh!

v2.1.0

更新内容

What's Changed

New Contributors

Contributors

Uh oh!

v2.0.0

多模态理解

多模态生成

DataCopilot（多模态数据处理工具箱）

Uh oh!