With open-source models becoming larger and larger (the largest Llama3 clocking in at 405B parameters), the challenge and costs to run are even more pronounced. Luckily, we can easily run distributed inference at high-performance on low-cost NVIDIA L40S instances with Crusoe Cloud.