Transformers for vision typically flatten images in a fixed row-major order, but this choice can significantly impact performance due to architectural approximations that are sensitive to patch order.
This repo introduces REOrder, a framework that discovers task-specific patch orderings by combining compressibility-based priors with learned permutation policies. REOrder boosts accuracy on datasets like ImageNet-1K and Functional Map of the World, demonstrating that smarter patch sequencing can meaningfully improve transformer performance.
With conda:
conda create -n reorder python=3.11
conda activate reorderor with pip:
python3 -m venv .reorder
source .reorder/bin/activatethen install our required packages:
pip3 install torch torchvision torchaudio pyyaml omegaconf wandb gpustat transformers timm matplotlib numpy ninja pytest torchinfo
pip install "causal-conv1d @ git+https://github.com/Dao-AILab/causal-conv1d.git@v1.5.0.post8"
pip install "git+https://github.com/hustvl/Vim.git@main#egg=mamba_ssm&subdirectory=mamba-1p1p1"An environment.yml is also provided as well as a requirements.txt.
./configs/ contains all the configs used in training models for the paper. Configurations are set up using OmegaConf. All configurations needed to reproduce the experiments in the paper are provided.
In ./configs/ there is a dir for a paths yaml. This yaml allows you define different paths for different hosts so that training can be launched on many devices. For each hostname you can define a path yaml. In src/config/utils.py::get_path_config_for_hostname() you can add a statement to resolve the new hostname to your new path yaml file.
Distributed training can be launched with:
torchrun --nproc_per_node=N main.py --config=path/to/config.yaml
Set --nproc_per_node to the number of nodes. In our experiments, we use either 4x 40GB A100s or 8x 80GB A100s. Details are in the Appendix for each experiment.
We have tested this repo with AMD MI250 and MI300 GPUs. It works with the caveat that model compilation has to be turned off. Perhaps a future version of TorchDynamo will address this issue. None of our experiments rely on AMD GPUs, but if you do, this repo will work!
./launch conatains a set of submitit scripts that are used to launch experiments on Slurm clusters. You will have to change Slurm arguments to ones that fit your cluster's setup.
If you use this code in your research, please cite our paper:
@misc{kutscher2025REOrder,
title={REOrdering Patches Improves Vision Models},
author={Declan Kutscher and David M. Chan and Yutong Bai and Trevor Darrell and Ritwik Gupta},
year={2025},
eprint={2505.23751},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.23751},
}