DOMINO: A Dual-System for Multi-step Visual Language Reasoning

This is a Pytorch implementation for DOMINO: A Dual-System for Multi-step Visual Language Reasoning.

TL;DR: We propose a dual-system for multi-step visual language reasoning called DOMINO which outperforms existing models on challenging chart question answering datasets.

DOMINO alternates between System-2 (a prompted LLM) and System-1 (a visual encoder-text decoder) to answer complex questions over charts. The text in blue callouts are generated by System-2. The text in green callouts are generated by System-1 and appended to the generation sequence of System-2 directly. The chart and the question are from ChartQA (Masry et al., 2022).

Code folders

(1) system1-vision: Fine-tuning and inference with the vision module.

(2) system2-lm: Prompting LM for solving downstream tasks.

Dependencies

Python >= 3.6
PyTorch == 1.12.1
transformers == 4.29.2
fairscale == 0.4.6
sentencepiece == 0.1.99

Data

We used the following datasets:

Fine-tuning a vision module for visual information extraction

cd system1-vision
sbatch ./scripts/finetune_deplot.sh <HOME_DIR>

After training, the checkpoint of the vision module is saved to $VISION_CHECKPOINT='HOME_DIR/outputs/checkpoint' for later use.

Prompting LM for downstream tasks

The scripts for different tasks are stored at system2-lm/scripts. To run the script,

cd system2-lm
./script/run_dualsys_chartQA.sh <HOME_DIR>

License

The code is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

Citation

Please cite our paper if DOMINO is used in your work:

@misc{wang2023domino,
      title={DOMINO: A Dual-System for Multi-step Visual Language Reasoning}, 
      author={Peifeng Wang and Olga Golovneca and Armen Aghajanyan and Xiang Ren and Muhao Chen and Asli Celikyilmaz and Maryam Fazel-Zarandi},
      year={2023},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
images		images
system1-vision		system1-vision
system2-lm		system2-lm
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DOMINO: A Dual-System for Multi-step Visual Language Reasoning

TL;DR: We propose a dual-system for multi-step visual language reasoning called DOMINO which outperforms existing models on challenging chart question answering datasets.

Code folders

Dependencies

Data

Fine-tuning a vision module for visual information extraction

Prompting LM for downstream tasks

License

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

facebookresearch/dual-system-for-visual-language-reasoning

Folders and files

Latest commit

History

Repository files navigation

DOMINO: A Dual-System for Multi-step Visual Language Reasoning

TL;DR: We propose a dual-system for multi-step visual language reasoning called DOMINO which outperforms existing models on challenging chart question answering datasets.

Code folders

Dependencies

Data

Fine-tuning a vision module for visual information extraction

Prompting LM for downstream tasks

License

Citation

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages