Crusoe Cloud
pricing
Flexible pricing options to meet your needs. Choose reserved, on-demand, or spot pricing for GPU instances, and pay-as-you-go or provisioned throughput for Crusoe Managed Inference.

Built for value, scalability & speed
Cost-effective performance
Our AI-optimized hardware and lightweight virtualization cut waste and unlock performance, so you get more done with less.
Growth-aligned commitments
Avoid GPU lock-ins with tailored agreements that scale with your needs and budget.
Flexible consumption models
Access our full portfolio of LLMs and generative models with pay-as-you-go pricing, or utilize our GPU/CPU offerings, with spot, on-demand, or reserved pricing options.
Pricing for compute instances and managed AI services
GPU instances pricing
Access the latest high-performance GPUs including NVIDIA GB200 NVL72 and AMD MI355X. Pay by the hour for maximum agility and unthrottled compute, or contact us to lock in guaranteed resources at our lowest rates.
CPU instances pricing
Ideal for data processing, model checkpointing and orchestrating your GPU clusters. Choose from a variety of vCPU and RAM configurations.
General-purpose
$0.04/vCPU-hr
Storage-optimized
$0.09/vCPU-hr
Storage
Reliable, low-latency storage designed to handle the massive datasets and high-throughput demands of modern AI workloads.
Persistent disks
$0.08
per GiB/month
Shared disks
$0.07
per GiB/month
Managed Kubernetes
A fully managed cluster that simplifies deployment and scaling of your AI applications across GPU and CPU resources.
Cluster pricing
$0.10
per cluster hour
Managed inference
(pay as you go)
Seamlessly integrate the industry's leading Large Language Models (LLMs) and generative models into your applications with flexible pay-as-you-go pricing.
DeepSeek
R1 0528
$1.35
$5.40
$0.68
DeepSeek
V3 0324
$0.50
$1.50
$0.25
GPT-OSS
120B
$0.15
$0.60
$0.08
Llama
3.3 70B Instruct
$0.25
$0.75
$0.13
Qwen3
235B A22B Instruct 2507
$0.22
$0.80
$0.11
Gemma 3
12B
$0.08
$0.30
$0.04
Kimi-K2
Thinking
$0.60
$2.50
$0.30
Managed inference
(provisioned throughput)
Ensure guaranteed throughput for your generative AI applications. Provisioned throughput is transacted via AI Model Units (AMUs). The longer your commitment, the lower your cost. Contact sales to learn more about provisioned throughput.
Frequently asked questions


On-demand pricing is our most flexible option, billed per-hour (or per-second) with no minimum commitment, and ideal for workloads where uptime, predictability, and stability are critical. Spot pricing offers significant discounts, but is better for fault-tolerant workloads that can be stopped and restarted without major disruption.
Yes. Reserved capacity is a custom agreement where you commit to a specific resource volume for a defined period, resulting in our deepest possible discounts and guaranteed resource availability. Contact our sales team to discuss options.
No. There are no upfront setup fees for on-demand GPU or CPU instances. Our billing is transparent; you only pay for the resources you consume.
All Crusoe Cloud GPU and CPU instances are billed by the minute.
At this time, Crusoe Cloud does not charge for network ingress or egress, either within a VPC or to/from the public internet.
The best rates are achieved through our reserved capacity option. These are custom, tailored contracts designed to align with your project’s timeline and budget. Contact our sales team to secure significant guaranteed discounts compared to on-demand rates.
While longer-term commitments naturally offer deeper savings, our commitment structures are tailored to meet your needs. We avoid rigid, one-size-fits-all contracts to provide the agility modern AI teams require. Speak with sales to design the minimum commitment that works for your roadmap.
Our infrastructure is purpose-built and engineered for AI. We deploy the latest high-interconnect NVIDIA GPUs, high-performance networking built on industry best practices for RDMA, and low-latency storage. We proactively monitor infrastructure to detect and remediate issues before they impact your workloads. This combination eliminates data bottlenecks and improves reliability, ensuring your models train faster and more efficiently at scale.
Crusoe Managed Inference uses a usage-based, pay-as-you-go model, billed per 1 million tokens. Input tokens are the text your application sends to the model; Output tokens are the text the model generates in response. Cached tokens are used when the model reuses previous context or prompts, which are typically billed at a much lower rate.
Are you ready to build something amazing?
