Choose the plan that fits your inference workload. Upgrade or downgrade any time. No lock-in, no hidden fees.
For teams deploying their first production AI model and getting familiar with inference infrastructure.
For high-growth AI teams running serious inference workloads who need reliability and enterprise-grade features.
For organizations requiring dedicated infrastructure, compliance, and custom deployment architecture.
Yes. Upgrades are instant. Downgrades take effect at the end of your current billing cycle. No penalties or lock-in.
Annual plans are billed upfront for 12 months. The Scale plan is $5,990/year — equivalent to 10 months at the monthly rate, giving you 2 months free.
We will notify you at 80% usage. Overage requests are billed at a per-million rate until your next cycle. Enterprise plans have no hard limits.
We offer a 14-day evaluation for Enterprise customers. For self-serve plans, the Starter tier is designed to be accessible for early-stage teams. Contact us to discuss your needs.
Talk to our team. We will help you identify the right starting point based on your model count, request volume, and latency requirements.