Webinar
Inside the cluster: How AI teams actually orchestrate at scale
Speakers:

Nikhil Gupta
Senior Software Engineer, Managed Orchestration,
Crusoe
Crusoe

Young Jeong
Staff Solutions Engineer, Crusoe
Join Nikhil Gupta, Senior Software Engineer on Crusoe's Managed Orchestration team, and Young Jeong, Solutions Engineer, for a unique practitioner session.
Nikhil is making the case for Kubernetes. Young is making the case for Slurm. And both of them know the other is right. Together, they'll share what they've learned across dozens of customer implementations — and show you exactly how the best teams stop running two resource pools and start running one.
Nikhil is making the case for Kubernetes. Young is making the case for Slurm. And both of them know the other is right. Together, they'll share what they've learned across dozens of customer implementations — and show you exactly how the best teams stop running two resource pools and start running one.
What we'll cover:
- Why Slurm and Kubernetes both belong in your stack — and how to stop running them as two separate problems
- Patterns that actually work in production: Slurm-on-K8s, partition isolation, and scheduler federation
- What breaks at scale, and how to build clusters that automatically recover without waking anyone up
- A live walkthrough of Crusoe Managed Slurm on Crusoe Managed Kubernetes, with AutoClusters, Autoscaler, and Command Center observability