Generative AI Service Pricing


Foundational models can be consumed on demand, where you pay per character based on the length of the prompt and the response from the model (except for the embedding models, where the response from the model isn’t accounted for). In the table below, a transaction = a character and 10,000 transactions = 10,000 characters.

Additionally, you can host private replicas of foundational models and create fine-tuned models on dedicated AI clusters. Dedicated AI clusters come in two types: hosting and fine-tuning. You create a hosting cluster by assigning AI units to it based on the model you want to host and the expected call volume to the model. Fine-tuning clusters require two AI units of the specific model you want to fine-tune. Once you create a fine-tuned model in a fine-tuning cluster, you can host it on your hosting cluster.

Dedicated AI clusters require a minimum commitment of 744 unit-hours (per cluster) for hosting models. Fine-tuning clusters require a minimum of 1 unit-hour.

OCI Generative AI

Product
Comparison Price (/vCPU) *
Unit price
Unit
Oracle Cloud Infrastructure Generative AI - Cohere Rerank - Dedicated


Cluster Hour
Oracle Cloud Infrastructure Generative AI - Meta Llama 4 Scout


10,000 Transactions
Oracle Cloud Infrastructure Generative AI - Meta Llama 4 Maverick


10,000 Transactions
Oracle Cloud Infrastructure Generative AI - Large Cohere


10,000 Transactions
Oracle Cloud Infrastructure Generative AI - Small Cohere


10,000 Transactions
Oracle Cloud Infrastructure Generative AI - Embed Cohere


10,000 Transactions
Oracle Cloud Infrastructure Generative AI - Large Meta


10,000 transactions
Oracle Cloud Infrastructure Generative AI - Meta Llama 3.1 405B


10,000 transactions
Oracle Cloud Infrastructure Generative AI - Meta Llama 3.2 90B Vision


10,000 transactions
Oracle Cloud Infrastructure Generative AI - Large Cohere - Dedicated


AI unit per hour
Oracle Cloud Infrastructure Generative x - Small Cohere - Dedicated


AI unit per hour
Oracle Cloud Infrastructure Generative AI - Embed Cohere - Dedicated


AI unit per hour
Oracle Cloud Infrastructure Generative AI - Large Meta - Dedicated


AI unit per hour
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 or Grok 4 - Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 or Grok 4 - Cached Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI – Grok 3 or Grok 4 - Output Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini - Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini - Cached Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini - Output Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Fast - Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Fast - Cached Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Fast - Output Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini Fast - Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini Fast - Cached Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 3 Mini Fast - Output Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI -Grok 4 Code -Grok-Code-Fast-1-Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Code Grok-Code-Fast-1- Cached Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI -Grok 4 Code - Grok-Code-Fast-1-Output Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Pro - Input Tokens - Text, Image, Audio, and Video less than 200K input tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Google -Gemini 2.5 Pro - Input Tokens - Text, Image, Audio, and Video greater than 200K input tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Pro - Output Tokens - Text Output less than 200K input tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Pro - Output Tokens - Text Output greater than 200K input tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash GA - Input Tokens - Text, Image, and Video

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash GA - Input Tokens - Audio

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash GA - Output Tokens - Text

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash Lite - Input Tokens - Text, Image, and Video

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash Lite - Input Tokens - Audio

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Google - Gemini 2.5 Flash Lite - Output Tokens - Text

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Input Tokens less than 128K Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Input Tokens greater than 128K Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Cached Input Tokens less than 128K Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Cached Input Tokens greater than 128K Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Output Tokens less than 128K Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - xAI - Grok 4 Fast - Output Tokens greater than 128K Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - Model Import

AI Unit Per Hour
Oracle Cloud Infrastructure Generative AI - OpenAI - gpt-oss-120b - Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - OpenAI - gpt-oss-120b - Output Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - OpenAI - gpt-oss-20b - Input Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - OpenAI - gpt-oss-20b - Output Tokens

1,000,000 Tokens
Oracle Cloud Infrastructure Generative AI - OpenAI - Dedicated

AI Unit Per Hour
  • A transaction is a character. 10,000 transactions = 10,000 characters

Foundational models can be consumed on demand, where you pay per character based on the length of the prompt and the response from the model (except for the embedding models, where the response from the model isn’t accounted for). In the table below, a transaction = a character and 10,000 transactions = 10,000 characters.

Additionally, you can host private replicas of foundational models and create fine-tuned models on dedicated AI clusters. Dedicated AI clusters come in two types: hosting and fine-tuning. You create a hosting cluster by assigning AI units to it based on the model you want to host and the expected call volume to the model. Fine-tuning clusters require two AI units of the specific model you want to fine-tune. Once you create a fine-tuned model in a fine-tuning cluster, you can host it on your hosting cluster.

Dedicated AI clusters require a minimum commitment of 744 unit-hours (per cluster) for hosting models. Fine-tuning clusters require a minimum of 1 unit-hour.