You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains the official implementation of Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation (ICCV 2025).
We propose a framework that enables Vision-Language Models to perform spatial reasoning in arbitrary perspectives.
🔧 Get Started
We have tested on Python 3.10, CUDA 12.4, and PyTorch 2.4.1. Please follow the below scripts for setting up the environment.
We provide an easy-to-use notebook, run_APC.ipynb, for quickly testing our APC framework.
Alternatively, you can run inference directly with run_APC.py. For example:
python run_APC.py \
--config apc/configs/qwenvl2_5_7b_instruct.yaml \
--device_vlm cuda:0 \
--device_vision cuda:0 \
--image_path demo/sample_image_man.jpg \
--prompt "If I stand at the person’s position facing where it is facing, is the table on the left or on the right of me?" \
--save_dir outputs/demo/man_table \
--visualize_trace \
--return_conv_history
An example of the saved conversation history from APC is as follows:
If you find our work useful, please consider citing:
@inproceedings{lee2025perspective,
title={Perspective-aware reasoning in vision-language models via mental imagery simulation},
author={Lee, Phillip Y and Je, Jihyeon and Park, Chanho and Uy, Mikaela Angelina and Guibas, Leonidas and Sung, Minhyuk},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2025}
}
About
[ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation