You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Take the most out of Google Cloud TPUs with the ease of 🤗 transformers
Tensor Processing Units (TPU) are AI accelerator made by Google to optimize
performance and cost from AI training to inference.
This repository exposes an interface similar to what Hugging Face transformers library provides to interact with
a magnitude of models developed by research labs, institutions and the community.
We aim at providing our user the best possible performances targeting Google Cloud TPUs for both training and inference
working closely with Google and Google Cloud to make this a reality.
Supported Model and Tasks
We currently support a few LLM models targeting text generation scenarios:
💎 Gemma (2b, 7b)
🦙 Llama2 (7b) and Llama3 (8b). On Text Generation Inference with Jetstream Pytorch, also Llama3.1, Llama3.2 and Llama3.3 (text-only models) are supported, up to 70B parameters.
💨 Mistral (7b)
Installation
optimum-tpu comes with an handy PyPi released package compatible with your classical python dependency management tool.
optimum-tpu provides a set of dedicated tools and integrations in order to leverage Cloud TPUs for inference, especially
on the latest TPU version v5e and v6e.
Other TPU versions will be supported along the way.
Text-Generation-Inference
As part of the integration, we do support a text-generation-inference (TGI) backend allowing to deploy and serve
incoming HTTP requests and execute them on Cloud TPUs.