CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

[Project Page] [arXiv] [ComfyUI node]

Gaoyang Zhang, Bingtao Fu, Qingnan Fan, Qi Zhang, Runxing Liu, Hong Gu, Huaqi Zhang, Xinguo Liu
ICCV 2025

TL; DR

CoMPaSS enhances the spatial understanding of existing text-to-image diffusion models, enabling them to generate images that faithfully reflect spatial configurations specified in the text prompt.

Setting up Environment

We manage our python environment with uv, and provide a convenient script for setting up the environment at setup_env.sh. Running this script will create a subdirectory .venv/ in the project root. To enable it, run source .venv/bin/activate after the environment is set up:

# install requirements into .venv/
bash ./setup_env.sh
# activate the environment
source .venv/bin/activate

Trying out CoMPaSS

Note

For training, SCOP and TENOR are both required.
For generating images from text, only TENOR and the reference weights are needed.

ComfyUI

We recommend trying out the FLUX.1-dev LoRA trained via CoMPaSS. Please refer to the custom node's repository to get started.

Reference Weights

We provide the reference weights used to report all metrics in our paper on Hugging Face 🤗. We recommend trying out the FLUX.1-dev weights as it is a Rank-16 LoRA which is only 50MB in size.

Model	Link
FLUX.1-dev	https://huggingface.co/blurgy/CoMPaSS-FLUX.1
SD1.4	https://huggingface.co/blurgy/CoMPaSS-SD1.4
SD1.5	https://huggingface.co/blurgy/CoMPaSS-SD1.5
SD2.1	https://huggingface.co/blurgy/CoMPaSS-SD2.1

The SCOP dataset

We provide full instructions for replicating the SCOP dataset (28,028 object pairs among 15,426 images) in the SCOP directory. Check out its README to get started.

The TENOR Module

We provide both training and inference instructions for using our TENOR module in the TENOR directory. MMDiT-based models (e.g., FLUX.1-dev) and UNet-based models (e.g., SD1.5) are both supported. Check out their respective instructions to get started:

Citation

@inproceedings{zhang2025compass,
  title={CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models},
  author={Zhang, Gaoyang and Fu, Bingtao and Fan, Qingnan and Zhang, Qi and Liu, Runxing and Gu, Hong and Zhang, Huaqi and Liu, Xinguo},
  booktitle={ICCV},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
SCOP		SCOP
TENOR		TENOR
assets		assets
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup_env.sh		setup_env.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

TL; DR

Setting up Environment

Trying out CoMPaSS

ComfyUI

Reference Weights

The SCOP dataset

The TENOR Module

Citation

About

Uh oh!

Releases

Packages

Languages

License

blurgyy/CoMPaSS

Folders and files

Latest commit

History

Repository files navigation

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

TL; DR

Setting up Environment

Trying out CoMPaSS

ComfyUI

Reference Weights

The SCOP dataset

The TENOR Module

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages