CARVIEW |

Foundational models can be consumed on demand, where you pay per character based on the length of the prompt and the response from the model (except for the embedding models, where the response from the model isn’t accounted for). In the table below, a transaction = a character and 10,000 transactions = 10,000 characters.
Additionally, you can host private replicas of foundational models and create fine-tuned models on dedicated AI clusters. Dedicated AI clusters come in two types: hosting and fine-tuning. You create a hosting cluster by assigning AI units to it based on the model you want to host and the expected call volume to the model. Fine-tuning clusters require two AI units of the specific model you want to fine-tune. Once you create a fine-tuned model in a fine-tuning cluster, you can host it on your hosting cluster.
Dedicated AI clusters require a minimum commitment of 744 unit-hours (per cluster) for hosting models. Fine-tuning clusters require a minimum of 1 unit-hour.
OCI Generative AI
Product |
Comparison Price (/vCPU) * |
Unit price |
Unit |
Oracle Cloud Infrastructure Generative AI - Cohere Rerank - Dedicated |
Cluster Hour |
||
Oracle Cloud Infrastructure Generative AI - Meta Llama 4 Scout |
10,000 Transactions |
||
Oracle Cloud Infrastructure Generative AI - Meta Llama 4 Maverick |
10,000 Transactions |
||
Oracle Cloud Infrastructure Generative AI - Large Cohere |
10,000 Transactions |
||
Oracle Cloud Infrastructure Generative AI - Small Cohere |
10,000 Transactions |
||
Oracle Cloud Infrastructure Generative AI - Embed Cohere |
10,000 Transactions |
||
Oracle Cloud Infrastructure Generative AI - Large Meta |
10,000 transactions |
||
Oracle Cloud Infrastructure Generative AI - Meta Llama 3.1 405B |
10,000 transactions |
||
Oracle Cloud Infrastructure Generative AI - Meta Llama 3.2 90B Vision |
10,000 transactions |
||
Oracle Cloud Infrastructure Generative AI - Large Cohere - Dedicated |
AI unit per hour |
||
Oracle Cloud Infrastructure Generative x - Small Cohere - Dedicated |
AI unit per hour |
||
Oracle Cloud Infrastructure Generative AI - Embed Cohere - Dedicated |
AI unit per hour |
||
Oracle Cloud Infrastructure Generative AI - Large Meta - Dedicated |
AI unit per hour |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 or Grok 4 - Input Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 or Grok 4 - Cached Input Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI – Grok 3 or Grok 4 - Output Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini - Input Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini - Cached Input Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini - Output Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Fast - Input Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Fast - Cached Input Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Fast - Output Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini Fast - Input Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini Fast - Cached Input Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini Fast - Output Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI -Grok 4 Code -Grok-Code-Fast-1-Input Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Code Grok-Code-Fast-1- Cached Input Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI -Grok 4 Code - Grok-Code-Fast-1-Output Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Pro - Input Tokens - Text, Image, Audio, and Video less than 200K input tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - Google -Gemini 2.5 Pro - Input Tokens - Text, Image, Audio, and Video greater than 200K input tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Pro - Output Tokens - Text Output less than 200K input tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Pro - Output Tokens - Text Output greater than 200K input tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash GA - Input Tokens - Text, Image, and Video |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash GA - Input Tokens - Audio |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash GA - Output Tokens - Text |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash Lite - Input Tokens - Text, Image, and Video |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash Lite - Input Tokens - Audio |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash Lite - Output Tokens - Text |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Input Tokens less than 128K Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Input Tokens greater than 128K Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Cached Input Tokens less than 128K Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Cached Input Tokens greater than 128K Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Output Tokens less than 128K Tokens |
1,000,000 Tokens |
||
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Output Tokens greater than 128K Tokens |
1,000,000 Tokens |
- A transaction is a character. 10,000 transactions = 10,000 characters
Foundational models can be consumed on demand, where you pay per character based on the length of the prompt and the response from the model (except for the embedding models, where the response from the model isn’t accounted for). In the table below, a transaction = a character and 10,000 transactions = 10,000 characters.
Additionally, you can host private replicas of foundational models and create fine-tuned models on dedicated AI clusters. Dedicated AI clusters come in two types: hosting and fine-tuning. You create a hosting cluster by assigning AI units to it based on the model you want to host and the expected call volume to the model. Fine-tuning clusters require two AI units of the specific model you want to fine-tune. Once you create a fine-tuned model in a fine-tuning cluster, you can host it on your hosting cluster.
Dedicated AI clusters require a minimum commitment of 744 unit-hours (per cluster) for hosting models. Fine-tuning clusters require a minimum of 1 unit-hour.