| CARVIEW |
RLHF-V
Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Zhiyuan Liu1*, Hai-Tao Zheng1*, Maosong Sun1 Tat-Seng Chua2
▶ 2. National University of Singapore
Abstract
Existing Multimodal Large Language Models prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. Our RLHF-V framework enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback.
- Fine-Grained and Diverse Human Preference Data: We collect 1.4K fine-grained human feedback consisting of 3.7k pieces of segment-level corrections, covering hallucination types including objects (41.2%), positions (20.3%), numbers (16.5%), attributes (10%), actions (5.3%), and others (6.8%).
- High Data Efficiency and Scalability: With just 1.4K annotated data, we achieve a 34.8% reduction in model hallucinations. Moreover, the decrease in hallucinations becomes more significant as more data used.
- Enhanced Performance and Computational Efficiency: Our fine-grained correctional human feedback data can better credit the desired behavior, allowing efficient training in 1 hour on 8 A100 GPUs to achieve promising results.
- Outstanding Trustworthiness without Compromising Helpfulness: Our model surpasses existing open-source MLLMs in reducing hallucination rates, mitigates hallucination from over-generalization, and maintains informativeness. Surprisingly, RLHF-V is even more resistant to the over-generalization problem compared with GPT-4V.
Method
The proposed RLHF-V framework:
We collect 1.4k fine-grained dense feedback data by asking human annotators to correct the hallucinated segments in model responses. The training takes only 1 hour with 8 A100 GPUs to get RLHF-V-13B which is initialized from our RLHF-V_SFT-13B.
Highlights
Low hallucination rate while being informative:
Data-efficient and showing good scaling results:
More resistant to over-generalization:
BibTeX
@article{2023rlhf-v,
author = {Tianyu Yu and Yuan Yao and Haoye Zhang and Taiwen He and Yifeng Han and Ganqu Cui and Jinyi Hu and Zhiyuan Liu and Hai-Tao Zheng and Maosong Sun and Tat-Seng Chua},
title = {RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback},
journal = {arxiv},
year = {2023},
}
Examples
- Short-form QA: RLHF-V can give a more trustworthy answer in short-form QA.
- Long-form QA: RLHF-V can generate informative image description with less hallucinations.
- Long-form QA: RLHF-V can provide detailed reasoning with less hallucinations.
- Long-form QA: RLHF-V is more resistant to over-generalization.
- Long-form QA: RLHF-V is more resistant to over-generalization.