Carview!

Vision-Oriented MultiModal AI

We integrate SOTA (state-of-the-art) models and provides a vision-oriented multi-modal framework. It's not an LLM (large-language model), but comprises multiple large-scale models, some of which are built on top of cutting-edge foundation models.

Purposes

The surging momentum of generative AI (GAI) heralds the dawn of a new era in Artificial General Intelligence (AGI). LLMs and CV multi-modal large-scale models are two dominant trends in the GAI age. ChatGPT and GPT-4 set a ceiling bar for LLMs, but CV multi-modal large-scale models are still emerging.

We have built a solid foundation for AI innovation and standardized data development. We roll out to help the community of CV multi-modal large-scale models. This project has the following purposes:

Provide a unified multi-modal framework for different applications based on multi-modal foundation models.
Integrate the SOTA vision models to build up a complete multi-modal platform by leveraging the real SOTA parts of these models.
Focus on vision-oriented AI to accelerate CV development compared with the status quo of LLMs.

Installation

The code requires python>=3.8, as well as pytorch>=1.7 and torchvision>=0.8. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

Unified MultiModel Framework (UMMF)

git clone git@github.com:LHBuilder/SA-Segment-Anything.git

Meta SAM

Install Segment Anything:

Please follow the instructions here to install Meta SAM.

Or

pip install segment_anything

The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. jupyter is also required to run the example notebooks.

pip install opencv-python pycocotools matplotlib onnxruntime onnx

YOLO-NAS

Please follow the instructions here to install YOLO-NAS.

Or

pip install super-gradients

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
DETR		DETR
DINO		DINO
Example_Files		Example_Files
SAM		SAM
SAM_Variants		SAM_Variants
YOLO		YOLO
codellama		codellama
public		public
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
integ_yolo_sam.ipynb		integ_yolo_sam.ipynb
sa2_custom_instance_segmentation_detr_sam_text.ipynb		sa2_custom_instance_segmentation_detr_sam_text.ipynb
sa2_custom_instance_segmentation_yolo_sam_image-text.ipynb		sa2_custom_instance_segmentation_yolo_sam_image-text.ipynb
sa2_custom_instance_segmentation_yolo_sam_ocr-text.ipynb		sa2_custom_instance_segmentation_yolo_sam_ocr-text.ipynb
sa2_custom_instance_segmentation_yolo_sam_text.ipynb		sa2_custom_instance_segmentation_yolo_sam_text.ipynb
unified_multimodal.py		unified_multimodal.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vision-Oriented MultiModal AI

Purposes

Installation

Unified MultiModel Framework (UMMF)

Meta SAM

YOLO-NAS

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

LHBuilder/SA-Segment-Anything

Folders and files

Latest commit

History

Repository files navigation

Vision-Oriented MultiModal AI

Purposes

Installation

Unified MultiModel Framework (UMMF)

Meta SAM

YOLO-NAS

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages