You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We integrate SOTA (state-of-the-art) models and provides a vision-oriented multi-modal framework. It's not an LLM (large-language model), but comprises multiple large-scale models, some of which are built on top of cutting-edge foundation models.
Purposes
The surging momentum of generative AI (GAI) heralds the dawn of a new era in Artificial General Intelligence (AGI). LLMs and CV multi-modal large-scale models are two dominant trends in the GAI age. ChatGPT and GPT-4 set a ceiling bar for LLMs, but CV multi-modal large-scale models are still emerging.
We have built a solid foundation for AI innovation and standardized data development. We roll out to help the community of CV multi-modal large-scale models. This project has the following purposes:
Provide a unified multi-modal framework for different applications based on multi-modal foundation models.
Integrate the SOTA vision models to build up a complete multi-modal platform by leveraging the real SOTA parts of these models.
Focus on vision-oriented AI to accelerate CV development compared with the status quo of LLMs.
Installation
The code requires python>=3.8, as well as pytorch>=1.7 and torchvision>=0.8. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
Please follow the instructions here to install Meta SAM.
Or
pip install segment_anything
The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. jupyter is also required to run the example notebooks.