You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 1, 2025. It is now read-only.
# $MODEL_PATH here is where you save the fine-tuned model.
# DATASET_NAME is FUNSD or INV-CDIP.
bash reproduce_results.sh $MODEL_PATH $DATA_DIR/DATASET_NAME
You should get the following results.
Datasets
Precision
Recall
F1
FUNSD
60.4
60.9
60.7
INV-CDIP
50.5
47.6
49.0
Pre-training
You can skip the following steps by downloading our pre-trained SimpleDLM model here.
# $NUM_GPUS is the number of gpus you want to do the pretraining on. To reproduce the paper's results we recommend to use 8 gpus.
# $MODEL_PATH here is where you save the LayoutLM model.
# $PRETRAIN_DATA_FOLDER is the folder of IIT-CDIP hocr files.
python -m torch.distributed.launch --nproc_per_node=$NUM_GPUS pretraining.py \
--model_name_or_path $MODEL_PATH --data_dir $PRETRAIN_DATA_FOLDER \
--output_dir $OUTPUT_DIR
Fine-tuning
Do fine-tuning following
# $MODEL_PATH is where you save the pre-trained simpleDLM model.
CUDA_VISIBLE_DEVICES=0 python run_query_value_retrieval.py --model_type simpledlm --model_name_or_path $MODEL_PATH \
--data_dir $DATA_DIR/FUNSD/ --output_dir $OUTPUT_DIR --do_train --evaluate_during_training
Citation
If you find this codebase useful, please cite our paper:
@article{gao2021value,
title={Value Retrieval with Arbitrary Queries for Form-like Documents},
author={Gao, Mingfei and Xue, Le and Ramaiah, Chetan and Xing, Chen and Xu, Ran and Xiong, Caiming},
journal={arXiv preprint arXiv:2112.07820},
year={2021}
}