| CARVIEW |
Leaderboard
Comprehensive evaluation results of instruction-guided video editing methods on IVEBench.
More up-to-date instruction-guided video editing methods will continue to be updated.
Leaderboard (Short Subset)
| # | Method | Total Score | Video Quality | Instruction Compliance |
Video Fidelity | Video Quality (Details) | Instruction Compliance (Details) | Video Fidelity (Details) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Subject Consistency |
Background Consistency |
Temporal Flickering |
Motion Smoothness |
VTSS | Overall Semantic Consistency |
Phrase Semantic Consistency |
Instruction Satisfaction |
Quantity Accuracy |
Semantic Fidelity |
Motion Fidelity |
Content Fidelity |
||||||
| 1 | Ditto | 0.667455 | 0.781191 | 0.49081 | 0.730363 | 0.962402 | 0.975734 | 0.96984 | 0.989527 | 0.038201 | 0.249228 | 0.243133 | 3.87 | 0.3 | 0.885714 | 0.789851 | 3.635 |
| 2 | InsV2V | 0.666815 | 0.795805 | 0.3861 | 0.818541 | 0.936759 | 0.958194 | 0.974885 | 0.974885 | 0.044564 | 0.241119 | 0.228607 | 3.0625 | 0.3 | 0.950594 | 0.855546 | 4.04875 |
| 3 | Lucy-Edit-Dev | 0.635298 | 0.820931 | 0.339448 | 0.745514 | 0.946271 | 0.964003 | 0.97851 | 0.990135 | 0.05082 | 0.238288 | 0.220275 | 2.8375 | 0.2 | 0.930882 | 0.677938 | 3.825 |
| 4 | VACE | 0.626087 | 0.798298 | 0.254199 | 0.825764 | 0.949597 | 0.965058 | 0.975686 | 0.975686 | 0.044513 | 0.23797 | 0.215348 | 2.1625 | 0.2 | 0.968574 | 0.885873 | 4.0325 |
| 5 | ICVE | 0.603252 | 0.712515 | 0.453532 | 0.643709 | 0.953842 | 0.97101 | 0.991063 | 0.995441 | 0.017079 | 0.228634 | 0.229431 | 3.6175 | 0.3 | 0.848825 | 0.457221 | 3.55 |
| 6 | Omni-Video | 0.586517 | 0.781993 | 0.43644 | 0.541117 | 0.963197 | 0.971408 | 0.976379 | 0.987218 | 0.038415 | 0.219778 | 0.228862 | 3.36 | 0.4 | 0.814609 | 0.507224 | 2.845 |
| 7 | AnyV2V | 0.57673 | 0.727088 | 0.417245 | 0.585857 | 0.892648 | 0.939805 | 0.972575 | 0.972575 | 0.026466 | 0.215903 | 0.236315 | 3.335 | 0.3 | 0.800876 | 0.815908 | 2.75 |
| 8 | StableV2V | 0.508923 | 0.691682 | 0.426694 | 0.408393 | 0.853475 | 0.919418 | 0.963825 | 0.963825 | 0.018734 | 0.197835 | 0.242328 | 3.56 | 0.2 | 0.700009 | 0.751333 | 1.7875 |
Leaderboard (Long Subset)
| # | Method | Total Score | Video Quality | Instruction Compliance |
Video Fidelity | Video Quality (Details) | Instruction Compliance (Details) | Video Fidelity (Details) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Subject Consistency |
Background Consistency |
Temporal Flickering |
Motion Smoothness |
VTSS | Overall Semantic Consistency |
Phrase Semantic Consistency |
Instruction Satisfaction |
Quantity Accuracy |
Semantic Fidelity |
Motion Fidelity |
Content Fidelity |
||||||
| 1 | InsV2V | 0.65715 | 0.802357 | 0.374118 | 0.794976 | 0.901442 | 0.944001 | 0.975373 | 0.975373 | 0.04835 | 0.240611 | 0.229095 | 3.1 | 0.2 | 0.952295 | 0.678833 | 4.125 |
| 2 | VACE | 0.616088 | 0.801204 | 0.267255 | 0.779804 | 0.916832 | 0.94867 | 0.959168 | 0.959168 | 0.048467 | 0.235583 | 0.215446 | 2.27 | 0.2 | 0.963778 | 0.883994 | 3.735 |
| 3 | Anyv2v | 0.55052 | 0.724021 | 0.355533 | 0.572005 | 0.836517 | 0.91594 | 0.969898 | 0.969898 | 0.028747 | 0.215527 | 0.228784 | 3.251852 | 0 | 0.796468 | 0.824666 | 2.651852 |
| 4 | StableV2V | 0.509333 | 0.693736 | 0.420937 | 0.413327 | 0.828293 | 0.905021 | 0.962683 | 0.962683 | 0.02092 | 0.204132 | 0.234756 | 3.44898 | 0.25 | 0.70401 | 0.773338 | 1.785714 |
| 5 | Lucy-Edit-Dev | 0.648858 | 0.821 | 0.315107 | 0.810466 | 0.910351 | 0.94848 | 0.980079 | 0.991183 | 0.052671 | 0.239159 | 0.217732 | 2.645 | 0.2 | 0.970202 | 0.73463 | 4.13 |
| 6 | Omni-Video | 0.570946 | 0.778392 | 0.424405 | 0.510041 | 0.94171 | 0.961136 | 0.968076 | 0.980258 | 0.039098 | 0.219006 | 0.229922 | 3.53 | 0.2 | 0.806642 | 0.551065 | 2.59 |
| 7 | ICVE | 0.587734 | 0.719584 | 0.403837 | 0.639781 | 0.953185 | 0.97228 | 0.99219 | 0.995999 | 0.019113 | 0.226036 | 0.228235 | 3.625 | 0 | 0.858698 | 0.483956 | 3.475 |
| 8 | Ditto | 0.659281 | 0.779508 | 0.478546 | 0.719789 | 0.964755 | 0.976063 | 0.970801 | 0.989857 | 0.037547 | 0.234154 | 0.243371 | 3.925 | 0.2 | 0.857041 | 0.728152 | 3.685 |
Note: Higher values indicate better performance for all metrics. Click on column headers to sort by different metrics.
Benchmark
Data Pipeline
Data acquisition and processing pipeline of IVEBench. 1) Curation process to 600 high-quality diverse videos. 2) Well-designed pipeline for comprehensive editing prompts.
Benchmark Statistics
Statistical distributions of IVEBench
Benchmark Comparison
Attributes comparison with open-source video editing benchmarks. Our proposed IVEBench boasts distinct advantages across various key dimensions.
Experiments
Qualitative Visualization
Qualitative comparison of state-of-the-art IVE methods.
Quantitative Visualization
IVEBench Evaluation Results of Video Editing Models. We visualize the evaluation results of four IVE models in 12 IVEBench metrics. We normalize the results per dimension for clearer comparisons.
BibTeX
@article{chen2025ivebench,
title={IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment},
author={Chen, Yinan and Zhang, Jiangning and Hu, Teng and Zeng, Yuxiang and Xue, Zhucun and He, Qingdong and Wang, Chengjie and Liu, Yong and Hu, Xiaobin and Yan, Shuicheng},
journal={arXiv preprint arXiv:2510.11647},
year={2025}
}