HeatWave GenAI licensing is the commercial wrapper around the in-database generative AI capabilities Oracle ships with MySQL HeatWave on OCI, AWS, and Azure. HeatWave GenAI includes in-database LLM inference using a curated catalogue of open-weight foundation models (Mistral variants, Llama family), an in-database vector store with HNSW indexing, the HeatWave Chat natural-language interface, and the in-database scale-out inference architecture that uses the existing HeatWave cluster compute for LLM workloads. The headline commercial position is that all of this is bundled with the HeatWave consumption rate — no separate per-token billing for the included models.
This piece works the HeatWave GenAI licensing position the way an Oracle insider sizing a customer-data analytics deployment would work it: the licensing model first, the bundled model catalogue second, the BYOM and fine-tuning rules third, the multi-cloud cost benchmark fourth, and the buyer-side commercial framework last. For the related Oracle Generative AI Service licensing context see our OCI Generative AI Service analysis; for the broader Oracle Cloud commercial framework see the Oracle Cloud licensing master guide.
The HeatWave GenAI licensing model
Bundled with HeatWave MySQL shape
HeatWave GenAI is bundled with the HeatWave MySQL managed service across OCI, AWS, and Azure. Customers pay the standard HeatWave shape-based hourly rate (per HeatWave node-hour) and gain access to the in-database LLM inference, the in-database vector store, the HeatWave Chat interface, and the natural-language SQL capabilities at no separate per-token uplift. The commercial mechanic is identical to the rest of HeatWave — the shape size determines the throughput capacity, the hourly rate covers the workload, and the inference happens inside the cluster.
The bundled model catalogue
The HeatWave GenAI model catalogue includes a curated set of open-weight foundation models that load directly into the HeatWave cluster for in-database inference. The 2026 catalogue includes Mistral variants (7B, 8x7B Mixtral), Meta Llama family variants (Llama 3 8B, Llama 3 70B at appropriate cluster sizes), and a set of embedding models for vector store population. The catalogue expands over time as additional open-weight models become available — Oracle has indicated a continuing commitment to expanding the bundled catalogue. Customers do not pay a per-token rate for inference against the bundled models — the inference runs against the existing HeatWave cluster compute under the shape-based hourly billing.
The in-database vector store
HeatWave includes an in-database vector store with HNSW indexing for vector similarity search. The vector store is implemented as a column type extension on MySQL tables and the similarity search uses the same HeatWave query acceleration architecture as the analytical workloads. The licensing position is the same as HeatWave GenAI generally — bundled with the shape-based hourly rate, no separate per-vector-stored or per-similarity-query metering. The cluster size has to be sufficient to hold the vector embeddings and the supporting metadata.
For workloads where the bundled HeatWave GenAI model catalogue is sufficient — and for most enterprise document classification, summarisation, embedding generation, and natural-language reporting workloads it is — HeatWave GenAI is materially less expensive than the per-token consumption model of Oracle Generative AI Service, OpenAI, Anthropic, or Cohere. The shape-based bundle covers unlimited inference. The economic question becomes the HeatWave cluster size required to support the workload, not the per-token rate.
The HeatWave shape and cost mechanics
The HeatWave shape catalogue
The shape-based pricing covers the database, the analytical query engine, the in-database LLM inference, the vector store, and the HeatWave Chat interface. The single line item is the shape-based hourly rate against the chosen cluster configuration. For a HeatWave Standalone 32 ECPU deployment running 24x7, the annual cost lands at approximately $51,500 covering the full database and AI workload — a material commercial advantage over equivalent deployments that separately meter the database compute and the per-token AI inference.
The break-even versus per-token providers
The break-even between HeatWave GenAI's shape-based bundle and a per-token external provider depends on the inference volume and the model class. For a workload generating 100 million tokens per month of inference against Llama 3 70B, the HeatWave Standalone 32 ECPU cluster at $51,500 per year covers both the database and the inference; the equivalent per-token consumption against Oracle Generative AI Service Llama 3 70B at indicative rates would land at approximately $80-120k per year for the inference alone, plus the database hosting cost separately. For high-volume bundled-model workloads, HeatWave GenAI typically wins on per-query economics.
The BYOM and fine-tuning rules
Loading additional open-weight models
HeatWave GenAI supports loading additional open-weight models from the supported catalogue beyond the default bundled set. The customer can load a specific model variant (a different Llama parameter size, a specialised fine-tuned variant from Hugging Face, a domain-specific open-weight model) into the HeatWave cluster and run inference against the same shape-based consumption rate. The constraint is the supported model architecture — HeatWave GenAI supports the open-weight model families that the in-database inference engine knows how to execute, not arbitrary third-party model formats.
Fine-tuning the bundled models
HeatWave GenAI supports fine-tuning the bundled foundation models with customer-supplied training data. The fine-tuning workflow runs against the HeatWave cluster compute, producing a fine-tuned model artifact that lives inside the customer's HeatWave deployment. The licensing position is the same — the fine-tuning consumption runs against the shape-based hourly rate, no separate per-token or per-training-hour uplift. The cluster has to be sized to absorb both the production inference load and the fine-tuning workload during the training window.
Closed-weight model integration
Closed-weight foundation models (GPT-4, Claude, Cohere Command R+) are not directly loadable into the HeatWave cluster. Customers wanting access to those models would use HeatWave GenAI as the in-database layer for the bundled models and call out to the external provider API for the closed-weight inference — paying the external provider per-token rate for that subset of the workload. The hybrid architecture is the typical pattern for customers requiring both the cost-efficient bundled inference and the premium closed-weight capability.
Building a HeatWave GenAI deployment for customer analytics?
We deliver the forensic cluster sizing forecast, the bundled-versus-external model routing analysis, the multi-cloud cost benchmark, the BYOM and fine-tuning economic framework, and the buyer-side commercial provisions to challenge the Oracle proposal.
Engage Oracle cloud advisory →The multi-cloud HeatWave GenAI deployment
HeatWave on AWS
HeatWave runs as a managed service on AWS with the HeatWave GenAI capabilities included. The consumption bills against the customer's AWS commercial commitment or pay-as-you-go arrangement. The HeatWave hourly rate on AWS is broadly comparable with the OCI rate at the equivalent shape size. The architectural advantage is that HeatWave on AWS keeps the database and the inference workload inside the AWS region the customer's application tier runs in — eliminating the cross-cloud egress for the database workload.
HeatWave on Azure
HeatWave runs as a managed service on Azure with the HeatWave GenAI capabilities included. The consumption bills against the customer's Azure commercial commitment (typically through the Microsoft Customer Agreement or Enterprise Agreement) or pay-as-you-go arrangement. The HeatWave hourly rate on Azure is broadly comparable with OCI and AWS. The architectural advantage is the proximity to Azure-native application tiers and the absorption against the Microsoft commercial commitment.
The commercial provisions
The commercial provisions for HeatWave GenAI deployments should include three forensic safeguards. First, the shape escalation policy — HeatWave allows online resize of the cluster shape, but the commercial implication is automatic. The provision should require commercial sign-off before any shape change. Second, the auto-scale governance — HeatWave Lakehouse and HeatWave Autopilot can automatically scale the cluster based on workload demand. The provision should cap the maximum scaled shape and require notification of scale events. Third, the multi-cloud arbitrage provision — the customer should retain the ability to migrate the HeatWave workload between OCI, AWS, and Azure depending on the broader commercial position with each provider.
"HeatWave GenAI is the most commercially friendly in-database LLM offering Oracle ships. The shape-based bundle covers unlimited inference against the curated open-weight model catalogue with no separate per-token metering. The buyer-side defence is to size the cluster forensically against the actual workload, to route closed-weight inference to the appropriate external provider, and to maintain the multi-cloud arbitrage as the negotiating lever."
An anonymised case study — global retailer, HeatWave GenAI document classification workload
A global retailer running a 24x7 HeatWave Standalone deployment on OCI sized the customer-feedback document classification pipeline against three architectural options in early 2026. Option A was the proposed Oracle account team architecture — HeatWave for the database, Oracle Generative AI Service Cohere Command R for the classification inference, with the per-token consumption billing against Universal Credits. Projected annual consumption: $480k of GenAI Service inference plus $120k of HeatWave hosting.
Option B was the buyer-side proposal — HeatWave GenAI with Llama 3 8B as the bundled classification model, running entirely inside the HeatWave cluster. The cluster sizing increased from 16 ECPUs to 32 ECPUs to absorb the inference workload, raising the HeatWave hosting cost from $120k to $240k annually. The classification inference became part of the shape-based bundle — no separate per-token consumption. Net annual cost: $240k. Option C was a hybrid architecture — HeatWave GenAI with Llama 3 8B for 90% of routine classifications, with the OCI Generative AI Service Cohere Command R+ fallback for the 10% of complex multi-language classifications. Projected combined cost: $290k annually.
The buyer-side recommendation was Option B with two commercial provisions. First, the HeatWave shape escalation was governed at the commercial level — the cluster could absorb the projected workload growth at the 32 ECPU shape for the contracted term, with a written cap on automatic shape escalation. Second, the multi-cloud arbitrage was preserved through a contractual provision allowing migration to HeatWave on AWS or Azure without licence penalty. Net annualised saving against the original Oracle account team proposal: $360k. The bundled model accuracy on the workload was benchmarked at 96% of the equivalent OCI Generative AI Service Cohere Command R performance — sufficient for the use case. For the broader Oracle commercial negotiation framework see our Oracle contract negotiation service.
Sizing a HeatWave GenAI deployment for production AI workload — request a confidential briefing.
We deliver the forensic cluster sizing, the bundled-versus-external model routing analysis, the multi-cloud cost benchmark, the fine-tuning and BYOM economic framework, and the buyer-side commercial provisions to challenge the Oracle account team proposal.
Request a HeatWave GenAI briefing →Independent · Confidential · Not affiliated with Oracle Corporation
The five buyer-side moves on HeatWave GenAI
Move 1 — Right-size the HeatWave cluster against the combined workload. The bundled inference makes the shape choice the dominant cost driver. Model the database load plus the inference load plus the vector store load against the cluster capacity. Don't undersize.
Move 2 — Benchmark the bundled model accuracy against the use case. The bundled Llama and Mistral variants are sufficient for many enterprise classification, summarisation, and embedding workloads. Benchmark on the actual customer data before committing to a premium closed-weight provider.
Move 3 — Maintain the multi-cloud arbitrage. HeatWave runs on OCI, AWS, and Azure with consistent GenAI capabilities. The cross-cloud portability is the negotiating leverage against any one provider's commercial position.
Move 4 — Govern the shape escalation contractually. The shape-based bundle is the commercial advantage, but automatic shape escalation can erode the economics quickly. Cap the maximum scaled shape contractually and require notification of scale events.
Move 5 — Route the closed-weight inference to the right provider. For the subset of workload requiring GPT-4 or Claude, the hybrid architecture (HeatWave GenAI for bundled models, external provider for closed-weight) gives the cost efficiency of the bundle and the capability of the premium provider. Don't route everything to the highest-cost option.
Frequently asked questions
Is HeatWave GenAI included with HeatWave MySQL on OCI?
HeatWave GenAI is bundled with HeatWave MySQL on OCI at no separate per-token uplift — the in-database LLM and vector store capabilities are included in the HeatWave consumption rate. Customers pay the standard HeatWave shape-based hourly price and gain access to the bundled foundation models (Mistral, Llama variants), the in-database vector store, the in-database scale-out LLM inference, and the natural-language SQL interface. This is a material differentiator versus the Oracle Generative AI Service consumption model — HeatWave GenAI does not separately meter per-token inference for the included models.
Can I bring my own model (BYOM) to HeatWave GenAI?
HeatWave GenAI supports a Bring Your Own Model pattern for fine-tuning the bundled foundation models with customer training data, and for loading additional open-weight models from the supported model catalogue (an expanding list including Llama family variants, Mistral, Falcon, and similar). The customer can load the model artifact into the HeatWave cluster and run inference against the same shape-based consumption rate — without separate per-token billing. Arbitrary third-party closed-weight models (GPT-4, Claude) are not supported in BYOM — customers wanting access to those models would use HeatWave GenAI as the in-database layer and call out to the external provider for the closed-weight inference.
How does HeatWave GenAI pricing compare with Oracle Generative AI Service?
HeatWave GenAI prices on the HeatWave shape-based hourly rate with no separate per-token uplift for the bundled models. Oracle Generative AI Service prices on a per-token rate (on-demand) or a per-cluster-hour rate (dedicated AI cluster). The commercial implications differ significantly. For high-volume sustained workloads against the bundled HeatWave model catalogue, HeatWave GenAI typically wins on per-query economics — the shape covers unlimited inference at the bundled model. For workloads requiring premium models (Cohere Command R+, Llama 3 70B at production scale), or for variable workloads without sustained throughput, the OCI Generative AI Service consumption model frequently wins. The buyer-side analysis should model both paths.
Can HeatWave GenAI run on AWS or Azure?
Yes — HeatWave is available as a managed service on OCI, AWS, and Azure. The HeatWave GenAI capabilities are part of the HeatWave platform across all three cloud providers. The consumption model (shape-based hourly) is consistent across the providers, with the absolute hourly rates varying by region and provider. Customers running HeatWave on AWS or Azure can deploy the HeatWave GenAI capabilities against the same MySQL workload without migrating to OCI. The commercial commitment vehicles differ — OCI HeatWave consumption draws against Universal Credits, AWS HeatWave consumption draws against AWS commitment or pay-as-you-go, and Azure HeatWave consumption draws against the Microsoft Enterprise Agreement or pay-as-you-go.
Where HeatWave GenAI sits in the broader AI infrastructure conversation
HeatWave GenAI bundles the in-database LLM capability with the MySQL HeatWave shape, but the workload economics still depend on the underlying GPU class — and that's where the buyer-side benchmark turns. Customers running large embedding generation, fine-tuning, or dedicated AI cluster workloads on OCI should right-size against the per-shape commercial framework forensic in our Oracle Cloud GPU SKUs pricing analysis. Customers evaluating HeatWave GenAI as part of a broader AI platform commitment with Oracle should benchmark the bundled-platform commercial pressure forensic in our Oracle AI Data Platform licensing analysis, and audit the framework integration patterns against our LangChain Oracle licensing risk analysis before signing a multi-product commitment.
Related reading
Free briefing every Friday.
Oracle audit alerts, Deal Desk intelligence, Java licensing updates, and negotiation tactics — written by former Oracle insiders. Read by 2,000+ enterprise buyers.
No spam. Unsubscribe anytime. Not affiliated with Oracle Corporation.