Cloud
May 1, 2025

Crusoe powers PyTorch innovation: driving AI performance and efficiency

Crusoe Team
January 16, 2026

At Crusoe, we're committed to not only providing enterprise-scale AI infrastructure but also actively contributing to the open-source community to drive the future of AI. We're proud to collaborate with the PyTorch community, and work alongside leading researchers and engineers to push the boundaries of AI performance and efficiency.

Our Crusoe Cloud platform, purpose-built for the most demanding AI workloads, provides an ideal environment for developing and testing these advancements. Here are just a couple of recent examples of how Crusoe and PyTorch are working together:

Accelerating Large-Scale Training with PyTorch Float8 Rowwise

A collaboration between Meta and Crusoe explored the benefits of PyTorch's new Float8 rowwise training on Crusoe's 2K H200 clusters in Iceland. The results were impressive, showcasing training accelerations of 34-43% at scale compared to BF16, with comparable convergence and stability.

This work detailed how Float8 rowwise, a finer-grained resolution for Float8, improves quantization precision for large-scale workloads. By leveraging Crusoe's H200 infrastructure, the team was able to demonstrate the real-world impact of this PyTorch innovation.

Boosting Checkpointing Efficiency with PyTorch DCP

Another collaboration between Meta and Crusoe focused on optimizing checkpointing efficiency with PyTorch Distributed Checkpointing (DCP). Using Crusoe's 2K H200 cluster and TorchTitan, the team verified that new DCP optimizations significantly reduced background processing time for asynchronous checkpoints – from approximately 436 seconds to a mere 67 seconds at 1856 GPU scale.

This research highlighted how asynchronous checkpointing minimizes GPU downtime during the saving of training progress, a crucial factor for large-scale AI training. Crusoe's infrastructure played a key role in validating these improvements and their potential to save valuable training time.

These are just two examples of how Crusoe is working hand-in-hand with the PyTorch community to accelerate AI development. Together, we're excited to continue pushing the limits of what's possible and empowering the next generation of AI innovation.

Latest articles

Chase Lochmiller - Co-founder, CEO
February 11, 2026
Introducing the Crusoe Cloud MCP server
Chase Lochmiller - Co-founder, CEO
February 9, 2026
Five examples of AI infrastructure done right
Chase Lochmiller - Co-founder, CEO
February 6, 2026
Up to 3X faster: Benchmarking Llama 3.1 fine-tuning on Crusoe Cloud with NVIDIA GB200 NVL72

Are you ready to build something amazing?