You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CASIA | Li Auto | AIR, Tsinghua University | Beihang University | HKUST | HKU
News
We update the latest version of our dataset and code.
Introduction
We introduce the new task of outdoor 3D dense captioning. As input, we assume a LiDAR point cloud and a set of RGB images captured by the panoramic camera rig. The expected output is a set of object boxes with captions. To tackle this task, we propose the TOD3Cap network, which leverages the BEV representation to generate object box proposals and integrates Relation Q-Former with LLaMA-Adapter to generate rich captions for these objects. We also introduce the TOD3Cap dataset, the largest one to our knowledge for 3D dense captioning in outdoor scenes, which contains 2.3M descriptions of 64.3K outdoor objects from 850 scenes in nuScenes.
Note
This reposity will be updated soon, including:
Initialization.
Uploading the TOD3Cap Dataset.
Uploading the Annotation Tools.
Uploading the codes of TOD3Cap Network.
Uploading the Installation guidelines.
Uploading the Training and Evaluation scripts.
Uploading the Visualization scripts of gt data and predicted results.
Two branch implementation: see tod3cap_camera/README.md.
Qualititive results
Citation
If you find our work useful in your research, please consider citing:
@article{jin2024tod3cap,
title={TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes},
author={Jin, Bu and Zheng, Yupeng and Li, Pengfei and Li, Weize and Zheng, Yuhang and Hu, Sujie and Liu, Xinyu and Zhu, Jinwei and Yan, Zhijie and Sun, Haiyang and others},
journal={arXiv preprint arXiv:2403.19589},
year={2024}
}
Acknowledgments
We would like to thank Dave Zhenyu Chen at Technical University of Munich for his valuable proofreading and insightful suggestions. We would also like to thank Lijun Zhou and the student volunteers at Li Auto for their efforts in building the TOD3Cap dataset.
Our code is built on top of open-source GitHub repositories. We thank all the authors who made their code public, which tremendously accelerates our project progress. If you find these works helpful, please consider citing them as well.