You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lenna: Language Enhanced Reasoning Detection Assistant
With the fast-paced development of multimodal large language models (MLLMs), we can now converse with AI systems in natural languages to understand images. However, the reasoning power and world knowledge embedded in the large language models have been much less investigated and exploited for image perception tasks. In this work, we propose Lenna a Language enhanced reasoning detectionassistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection. This is achieved by incorporating an additional <DET> token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position. To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection. For more details, please refer to the paper.
Lenna Architecture
Getting Started
1. Installation
We utilize A100 GPU for training and inference.
Git clone our repository and creating conda environment:
When the model has finished loading, you will see the following prompt:
[Lenna] Please input your caption: {input your caption}
[Lenna] Input prompt: Please detect the {your caption} in this image.
[Lenna] Please input the image path: {input your image path}
Fill in the {input your caption} with the description of the object you want to detect, and the {input your image path} with your image path.
Updates
2023-12-28 Inference code and the Lenna-7B model are released.
@article{wei2023lenna,
title={Lenna: Language enhanced reasoning detection assistant},
author={Wei, Fei and Zhang, Xinyu and Zhang, Ailing and Zhang, Bo and Chu, Xiangxiang},
journal={arXiv preprint arXiv:2312.02433},
year={2023}
}