OPUS: Occupancy Prediction Using a Saprse Set
- Authors: Jiabao Wang*, Zhaojiang Liu*, Qiang Meng, Liujiang Yan, Ke Wang, Jie Yang, Wei Liu, Qibin Hou#, Ming-Ming Cheng
- Paper in arXiv | 知乎
OPUS-V2: Bridging the Gap between Continuous and Discrete Occupancy
- Authors: Jiabao Wang*, Qiang Meng, Liujiang Yan, Ke Wang, Qibin Hou#, Ming-Ming Cheng
- Paper in arXiv | 知乎
(* Equal contribition, # Corresponding author)
- [2025/09/18]: We release the pretrained models of OPUS-Fusion.
- [2025/02/10]: 🚀We release the fusion version of OPUS. The performance has been boosted to 51.4 mIoU and 51.8 RayIoU on the NuScene-Occ3D dataset.
- [2025/01/10]: We release the visualization code.
- [2024/09/26]: 🚀OPUS is accepeted by NeurIPS 2024.
- [2024/09/17]: 🚀We release an initial version of OPUS. It achieves promising performance of 41.2 RayIoU and 36.2 mIoU on the NuScene-Occ3D dataset.
OPUS: Occupancy Prediction Using a Sparse Set
Occupancy prediction, aiming at predicting the occupancy status within voxelized 3D environment, is quickly gaining momentum within the autonomous driving community. Mainstream occupancy prediction works first discretize the 3D environment into voxels, then perform classification on such dense grids. However, inspection on sample data reveals that the vast majority of voxels is unoccupied. Performing classification on these empty voxels demands suboptimal computation resource allocation, and reducing such empty voxels necessitates complex algorithm designs. To this end, we present a novel perspective on the occupancy prediction task: formulating it as a streamlined set prediction paradigm without the need for explicit space modeling or complex sparsification procedures. Our proposed framework, called OPUS, utilizes a transformer encoder-decoder architecture to simultaneously predict occupied locations and classes using a set of learnable queries. Firstly, we employ the Chamfer distance loss to scale the set-to-set comparison problem to unprecedented magnitudes, making training such model end-to-end a reality. Subsequently, semantic classes are adaptively assigned using nearest neighbor search based on the learned locations. In addition, OPUS incorporates a suite of non-trivial strategies to enhance model performance, including coarse-to-fine learning, consistent point sampling, and adaptive re-weighting, etc. Finally, compared with current state-of-the-art methods, our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2X FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
OPUS-V2: Bridge the Gap between Continuous and Discrete Occupancy
State-of-the-art occupancy prediction methods often achieve high accuracy at the cost of significant computational expense, thus hindering their spatial extension. While the recent OPUS framework introduced an efficient point-set prediction paradigm, its accuracy trails behind more complex models. In this paper, we identify a representation gap problem of OPUS between its continuous predictions and the discrete ground-truth voxels. This gap necessitates manually constructed intermediates, which introduce training instability and additional errors. To mitigate this problem, this paper proposes OPUS-V2, which incorporates a lightweight Continuous Prediction Discretization (CPD) module behind the decoder. The CPD adaptively maps continuous predictions into the discrete voxel space, obviating the need for suboptimal handcrafted intermediates and thereby enhancing model accuracy. Furthermore, our architecture decouples the feature and occupancy generation processes, allowing OPUS-V2 to adapt to arbitrary resolutions. OPUS-V2 achieves a state-of-the-art RayIoU of 44.0 on the Occ3D dataset. On the more challenging OpenOccupancy dataset, it attains a competitive 16.4 mIoU while running at a real-time 20.6 FPS.
- Comparison between different type of predictions
- Occupancy prediction in long term.
opusv2_results.mp4
Camera only OPUS-V2 on NuScene-Occ3D dataset
| Models | Epochs | Q | P | mIoU | RayIoU1m | RayIoU2m | RayIoU4m | RayIoU | FPS | Link |
| OPUS-V2-T | 100 | 600 | 128 | 36.5 | 35.8 | 42.8 | 47.3 | 42.0 | 25.8 | Model |
| OPUS-V2-S | 100 | 1200 | 64 | 37.3 | 36.7 | 43.5 | 47.8 | 42.7 | 23.7 | Model |
| OPUS-V2-M | 100 | 2400 | 32 | 37.7 | 37.2 | 44.3 | 48.5 | 43.3 | 16.0 | Model |
| OPUS-V2-L | 100 | 4800 | 16 | 38.6 | 38.0 | 45.0 | 49.2 | 44.0 | 8.6 | Model |
Camera only OPUS-V2 on NuScene-OpenOccupancy dataset
| Models | IoU | MIoU | barrier | bicycle | bus | car | const. veh. | motorcycle | pedestrian | traffic cone | trailer | truck | drive. suf. | other flat | sidewalk | terrain | manmade | vegetation | FPS | Link |
| OPUS-V2-T | 27.4 | 16.4 | 17.5 | 7.5 | 16.2 | 18.9 | 10.4 | 11.7 | 6.9 | 6.5 | 8.2 | 14.9 | 39.3 | 27.3 | 25.4 | 23.2 | 11.7 | 16.4 | 20.6 | Model |
| OPUS-V2-S | 27.6 | 16.7 | 17.0 | 9.1 | 15.8 | 19.2 | 10.1 | 12.7 | 8.1 | 8.0 | 8.1 | 14.9 | 39.6 | 27.0 | 25.8 | 23.8 | 11.6 | 16.6 | 19.2 | Model |
| OPUS-V2-M | 27.9 | 17.4 | 18.5 | 11.2 | 15.9 | 19.5 | 10.5 | 13.8 | 9.4 | 9.3 | 8.2 | 15.3 | 39.8 | 27.5 | 26.2 | 23.8 | 12.4 | 17.1 | 13.7 | Model |
| OPUS-V2-L | 28.7 | 18.1 | 19.3 | 11.5 | 16.4 | 19.9 | 11.8 | 15.0 | 10.0 | 10.6 | 8.4 | 15.8 | 39.9 | 27.7 | 26.8 | 24.5 | 13.9 | 18.0 | 8.0 | Model |
Camera Lidar fusion OPUS-V1 on NuScene-Occ3D dataset
| Models | Epochs | Q | P | mIoU | RayIoU1m | RayIoU2m | RayIoU4m | RayIoU | FPS | Link |
| OPUS-V1-Fus-T | 100 | 600 | 128 | 48.7 | 45.4 | 50.3 | 53.3 | 49.7 | 10.2 | Model |
| OPUS-V1-Fus-S | 100 | 1200 | 64 | 49.6 | 45.9 | 51.0 | 54.1 | 50.4 | 9.5 | Model |
| OPUS-V1-Fus-M | 100 | 2400 | 32 | 50.5 | 46.4 | 51.2 | 54.2 | 50.6 | 6.9 | Model |
| OPUS-V1-Fus-L | 100 | 4800 | 16 | 51.4 | 47.6 | 52.4 | 55.3 | 51.8 | 3.2 | Model |
Camera only OPUS-V1 on NuScene-Occ3D dataest
| Models | Epochs | Q | P | mIoU | RayIoU1m | RayIoU2m | RayIoU4m | RayIoU | FPS | Link |
| OPUS-V1-T | 100 | 600 | 128 | 33.2 | 31.7 | 39.2 | 44.3 | 38.4 | 22.4 | Model |
| OPUS-V1-S | 100 | 1200 | 64 | 34.2 | 32.6 | 39.9 | 44.7 | 39.1 | 20.7 | Model |
| OPUS-V1-M | 100 | 2400 | 32 | 35.6 | 33.7 | 41.1 | 46.0 | 40.3 | 13.4 | Model |
| OPUS-V1-L | 100 | 4800 | 16 | 36.2 | 34.7 | 42.1 | 46.7 | 41.2 | 7.2 | Model |
note: Q denotes query numbers. P is the number of predicted points per query.
We build OPUS based on Pytorch 1.13.1 + CUDA 11.6
conda create -n opus python=3.8
conda activate opus
conda install pytorch==1.13.1 torchvision==0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
Install other dependencies:
pip install spconv-cu118 # Change the cuda version
pip install openmim
mim install mmcv-full==1.6.0
mim install mmdet==2.28.2
mim install mmsegmentation==0.30.0
mim install mmdet3d==1.0.0rc6
Install turbojpeg and pillow-simd to speed up data loading (optional but important):
sudo apt-get update
sudo apt-get install -y libturbojpeg
pip install pyturbojpeg
pip uninstall pillow
pip install pillow-simd==9.0.0.post1
Compile CUDA extensions:
cd models/csrc
python setup.py build_ext --inplace
-
Download nuScenes from https://www.nuscenes.org/nuscenes and place it in folder
data/nuscenes. -
(Optional) Download Occ3d-nuScenes from the link and place it in
data/nuscenes/gts -
(Optional) Download Occ3d-OpenOccupancy from the link and place it in
data/nuscenes/occupancy -
Prepare data with scripts provided by mmdet3d:
mim run mmdet3d create_data nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
- Perform data preparation for OPUS:
python gen_sweep_info.py
The final folder structure would be
data/nuscenes
├── maps
├── nuscenes_infos_test_sweep.pkl
├── nuscenes_infos_train_sweep.pkl
├── nuscenes_infos_train_mini_sweep.pkl
├── nuscenes_infos_val_sweep.pkl
├── nuscenes_infos_val_mini_sweep.pkl
├── samples
├── sweeps
├── gts % Occ3D dataset (Optional)
├── occupancy % OpenOccupancy dataset (Optional)
├── v1.0-test
└── v1.0-trainval
Note: These *.pkl files can also be generated with our script: gen_sweep_info.py.
Download pre-trained weights
provided by mmdet3d, and put them in directory pretrain/.
If you want to train OPUS-Fusion model, please download DAL-tiny pre-trained weights in directory pretrain/ and run python scripts/gen_fusion_pretrain_model.py.
pretrain
├── cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth
├── dal-tiny-map66.9-nds71.1.pth (optional)
├── fusion_pretrain_model.pth (optional)
Train OPUS with a single GPU:
python train.py --config configs/opusv1_nusc-occ3d/opusv1-t_r50_704x256_8f_nusc-occ3d_100e.py
Train OPUS with 8 GPUs:
bash dist_train.sh 8 configs/opusv1_nusc-occ3d/opusv1-t_r50_704x256_8f_nusc-occ3d_100e.py
Note: The batch size for each GPU will be scaled automatically. So there is no need to modify the batch_size in configurations.
Single-GPU evaluation:
export CUDA_VISIBLE_DEVICES=0
python val.py --config configs/opusv1_nusc-occ3d/opusv1-t_r50_704x256_8f_nusc-occ3d_100e.py --weights path/to/checkpoints
Multi-GPU evaluation:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
bash dist_val.sh 8 configs/opusv1_nusc-occ3d/opusv1-t_r50_704x256_8f_nusc-occ3d_100e.py --weights path/to/checkpoints
Visualizing results
python visualize.py --config configs/opusv1_nusc-occ3d/opusv1-t_r50_704x256_8f_nusc-occ3d_100e.py --weights path/to/checkpoints
Visualizing inputs and ground-truths
python visualize.py --config configs/opusv1_nusc-occ3d/opusv1-t_r50_704x256_8f_nusc-occ3d_100e.py --weights path/to/checkpoints --vis-input --vis-gt
If this work is helpful for your research, please consider citing the following entry.
@inproceedings{wang2024opus,
title={Opus: occupancy prediction using a sparse set},
author={Wang, Jiabao and Liu, Zhaojiang and Meng, Qiang and Yan, Liujiang and Wang, Ke and Yang, Jie and Liu, Wei and Hou, Qibin and Cheng, Mingming}
booktitle={Advances in Neural Information Processing Systems},
year={2024}
}
Our code is developed on top of following open source codebase:
We sincerely appreciate their amazing works.

