You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 1, 2025. It is now read-only.
TL;DR: We propose a dual-system for multi-step visual language reasoning called DOMINO which outperforms existing models on challenging chart question answering datasets.
DOMINO alternates between System-2 (a prompted LLM) and System-1 (a visual encoder-text decoder) to answer complex questions over charts. The text in blue callouts are generated by System-2. The text in green callouts are generated by System-1 and appended to the generation sequence of System-2 directly. The chart and the question are from ChartQA (Masry et al., 2022).
Code folders
(1) system1-vision: Fine-tuning and inference with the vision module.
(2) system2-lm: Prompting LM for solving downstream tasks.
Fine-tuning a vision module for visual information extraction
cd system1-vision
sbatch ./scripts/finetune_deplot.sh <HOME_DIR>
After training, the checkpoint of the vision module is saved to $VISION_CHECKPOINT='HOME_DIR/outputs/checkpoint' for later use.
Prompting LM for downstream tasks
The scripts for different tasks are stored at system2-lm/scripts. To run the script,
cd system2-lm
./script/run_dualsys_chartQA.sh <HOME_DIR>
License
The code is CC-BY-NC 4.0 licensed, as found in the LICENSE file.
Citation
Please cite our paper if DOMINO is used in your work:
@misc{wang2023domino,
title={DOMINO: A Dual-System for Multi-step Visual Language Reasoning},
author={Peifeng Wang and Olga Golovneca and Armen Aghajanyan and Xiang Ren and Muhao Chen and Asli Celikyilmaz and Maryam Fazel-Zarandi},
year={2023},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.CL}
}