ChartT5

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Mingyang Zhou, Yi R. Fung, Long Chen, Christopher Thomas, Heng Ji, Shih-Fu Chang
ChartT5 is a vision and language model for chart understanding via pre-training on plot-table pairs. This repository provides the code for pre-training and fine-tuning on the ChartQA Downstream tasks.

Setup

Install conda to set up the environment for this code with the following command.

conda env create -f chartT5.yml
#activate the virtual environment
conda activate chartT5
# Download T5/BART backbone checkpoint
python download_backbones.py

Download Data and Pre-trained Checkpoints

The data for pre-training and fine-tuning is downloadable via this link:
Preprocessed_Data

The pre-trained checkpoints can be downloaded via this link:
Pre-trained Checkpoint

We are still working on preparing the extracted visual features for download. However, you can also extract visual features from the images with the following instructions.

conda env create -f feature_extractor.yml
conda activate chart_feature_extractor
cd feature_extraction
#extract features for chartQA dataset
python chartqa_proposal.py --data_root /path/to/your/chartvqa --split train/val/test

Pre-training

After extract the data, change the pretrain_datadir in ChartT5/src/chart_pretrain_data.py to the /path/to/extracted_data/pretrain . Then run the following command to start the pre-training:

cd ChartT5
bash scripts/Chartpretrain_VLT5.sh 2

Downstream Task Fine-tuning

Chart VQA

After extract the data, change the chartqa_root in ChartT5/src/chartqa_data.py to /path/to/extracted_data/chart_qa. Also change the src_folder in ChartT5/scripts/ChartQA_VLT5.sh to /path/to/extracted_data/chart_qa. Then run the following command to start the fine-tuning:

cd ChartT5
bash scripts/ChartQA_VLT5.sh 2

Citation

Please cite our paper if you use our model in your works:

@inproceedings{zhou2023chartt5,
  title     = {Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs},
  author    = {Mingyang Zhou, Yi R. Fung, Long Chen, Christopher Thomas, Heng Ji, Shih-Fu Chang},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2023},
  year      = {2023}
}

Acknowledge

Our code is mainly based on VLT5. We thank the author for opening source their code and checkpoints. Portions of the code also uses the resource from ChartQA.

Liscense

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
ChartT5		ChartT5
assets		assets
feature_extraction		feature_extraction
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chartT5.yml		chartT5.yml
cog.yaml		cog.yaml
download_backbones.py		download_backbones.py
feature_extractor.yml		feature_extractor.yml
predict.py		predict.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChartT5

Setup

Download Data and Pre-trained Checkpoints

Pre-training

Downstream Task Fine-tuning

Chart VQA

Citation

Acknowledge

Liscense

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

zmykevin/ACL2023_ChartT5

Folders and files

Latest commit

History

Repository files navigation

ChartT5

Setup

Download Data and Pre-trained Checkpoints

Pre-training

Downstream Task Fine-tuning

Chart VQA

Citation

Acknowledge

Liscense

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages