Re-Thinking Inverse Graphics With Large Language Models

Peter Kulits^*, Haiwen Feng^*, Weiyang Liu, Victoria Abrevaya, Michael J. Black

Summary

We present the Inverse-Graphics Large Language Model (IG-LLM) framework, a general approach to solving inverse-graphics problems. We instruction-tune an LLM to decode a visual (CLIP) embedding into graphics code that can be used to reproduce the observed scene using a standard graphics engine. Leveraging the broad reasoning abilities of LLMs, we demonstrate that our framework exhibits natural generalization across a variety of distribution shifts without the use of special inductive biases.

Data

Training and evaluation data can be found at https://ig-llm.is.tue.mpg.de/download.php after registering on the project page. The following is an outline of the data available:

Details

├── CLEVR
│   ├── images
│   │   ├── train.tar
│   │   ├── val_ID.tar
│   │   └── val_OOD.tar
│   └── labels
│       ├── train.json
│       ├── val_ID.json
│       └── val_OOD.json
├── 2D
│   └── 2d.npz
├── SO3
│   ├── images
│   │   ├── train.tar
│   │   ├── val_ID.tar
│   │   └── val_OOD.tar
│   └── labels
│       ├── train.json
│       ├── val_ID.json
│       └── val_OOD.json
├── 6DoF
│   ├── images
│   │   ├── train.tar
│   │   └── val_ID.tar
│   └── labels
│       ├── train.json
│       └── val_ID.json
└── ShapeNet
    ├── images
    │   ├── train.tar
    │   ├── val_ID.tar
    │   ├── val_OOD_texture.tar
    │   └── val_OOD_shape.tar
    └── labels
        ├── train.json
        ├── val_ID.json
        ├── val_OOD_texture.json
        └── val_OOD_shape.json

Setup

The environment can be configured with conda/micromamba from environment.yml or using the Dockerfile.

Training

After the data has been downloaded, training can be initiated with the following:

CLEVR

Details

python train.py \
    --images_tar data/CLEVR/images/train.tar \
    --data_path data/CLEVR/images/train.json \
    --images_val_tar data/CLEVR/images/val_OOD.tar \
    --data_path_val data/CLEVR/labels/val_OOD.json \
    --per_device_train_batch_size X \
    --output_dir ./checkpoints/clevr-Y \
    --max_steps 40000 \
    --float_head_type (none|tanh_mlp_gelu) \
    --image_aspect_ratio pad \
    --num_samples 4000

2D

Details

2d.npz is expected to be at data/2d.npz prior to running train.py.

python train.py \
    --data_path checkerboard_sparse \
    --data_path_val random \
    --per_device_train_batch_size X \
    --output_dir ./checkpoints/2d-Y \
    --max_steps 40000 \
    --float_head_type (none|tanh_mlp_gelu) \
    --image_aspect_ratio pad \
    --is_2d True

SO(3)

Details

python train.py \
    --images_tar data/SO3/images/train.tar \
    --data_path data/SO3/images/train.json \
    --images_val_tar data/SO3/images/val_OOD.tar \
    --data_path_val data/SO3/labels/val_OOD.json \
    --per_device_train_batch_size X \
    --output_dir ./checkpoints/so3-Y \
    --max_steps 40000 \
    --float_head_type (none|tanh_mlp_gelu) \
    --image_aspect_ratio pad \
    --rotation_rep (euler_int|euler|aa|6d)

6-DoF

Details

python train.py \
    --images_tar data/6DoF/images/train.tar \
    --data_path data/6DoF/images/train.json \
    --images_val_tar data/6DoF/images/val_ID.tar \
    --data_path_val data/6DoF/labels/val_ID.json \
    --per_device_train_batch_size X \
    --output_dir ./checkpoints/6dof-Y \
    --max_steps 200000 \
    --float_head_type (none|tanh_mlp_gelu) \
    --image_aspect_ratio pad \
    --rotation_rep (euler_int|euler|aa|6d)

ShapeNet

Details

python train.py \
    --images_tar data/ShapeNet/images/train.tar \
    --data_path data/ShapeNet/images/train.json \
    --images_val_tar data/ShapeNet/images/val_OOD_texture.tar \
    --data_path_val data/ShapeNet/labels/val_OOD_texture.json \
    --per_device_train_batch_size X \
    --output_dir ./checkpoints/shapenet-Y \
    --max_steps 500000 \
    --float_head_type (none|tanh_mlp_gelu) \
    --image_aspect_ratio pad \
    --rotation_rep (euler_int|euler|aa|6d)

Inference

python inference.py \
    --model-path ./checkpoints/clevr-Y \
    --images_tar data/CLEVR/images/val_OOD.tar \
    --out_path ./out/clevr-Y-val_OOD.json \
    --image_aspect_ratio pad

License

We build off the LLaVA codebase to perform our experiments. As such, inherited code falls under the original Apache 2.0 license. Additions and modifications are released under a different license in accordance with institute requirements which has been prepended to LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 450 Commits
llava		llava
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
inference.py		inference.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Re-Thinking Inverse Graphics With Large Language Models

Summary

Data

Setup

Training

Inference

License

About

Uh oh!

Contributors 42

Uh oh!

Languages

License

kulits/IG-LLM

Folders and files

Latest commit

History

Repository files navigation

Re-Thinking Inverse Graphics With Large Language Models

Summary

Data

Setup

Training

Inference

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 42

Uh oh!

Languages