You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
The OPT 125M--175B models are now supported in the Alpa project, which
enables serving OPT-175B with more flexible parallelisms on older generations of GPUs, such as 40GB A100, V100, T4, M60, etc.
Using OPT with Colossal-AI
The OPT models are now supported in the Colossal-AI, which helps users to efficiently and quickly deploy OPT models training and inference, reducing large AI model budgets and scaling down the labor cost of learning and deployment.
Using OPT with CTranslate2
The OPT 125M--66B models can be executed with CTranslate2, which is a fast inference engine for Transformer models. The project integrates the SmoothQuant technique to allow 8-bit quantization of OPT models. See the usage example to get started.
Using OPT with FasterTransformer
The OPT models can be served with FasterTransformer, a highly optimized inference framework written and maintained by NVIDIA. We provide instructions to convert OPT checkpoints into FasterTransformer format and a usage example with some benchmark results.
If you have any questions, bug reports, or feature requests regarding either the codebase or the models released in the projects section, please don't hesitate to post on our Github Issues page.