By: Robert Thompson - Systems and Software Engineer at Avalanche Energy , Kate Kelly - Chief of Staff at Avalanche Energy
Introduction
To achieve net zero emissions and energy resiliency, the world needs scalable, renewable and sustainable clean energy. This effort is being advanced by Avalanche Energy's mission to provide first of its kind fusion microreactors. Since Avalanche's work began in June 2021, it has developed and operated two high-voltage reactor prototypes, achieving significant performance steps over the last two years. Avalanche’s modular, desk-sized reactor can be stacked for near-endless power applications and unprecedented energy density. This unique approach enables rapid iterations of design, build, test, and fix cycles for fast development and rapid scalability. Avalanche’s technology will have broad implications for energy independence and national security, opening the door to a wide range of applications, including carbon-free energy generation, advanced space propulsion, microgrids, and transportation.
An important part of the design process at Avalanche is rapid simulation. Avalanche uses particle-in-cell plasma simulation technologies to rapidly evaluate and analyze different configurations of our fusion microreactor to inform in-lab experiments that can start within hours of completing a simulation. Avalanche chose a multicloud approach for these simulations by orchestrating the workloads utilizing Amazon Web Services (AWS) and running the underlying simulations on Crusoe Cloud, a climate-aligned cloud computing platform from Crusoe Energy.
Avalanche's Simulation Workflow Architecture
Areas of focus that were important for Avalanche were scalability, flexibility, collaboration and multivariate analysis.
Scalability:
A compute platform needed to support a growing team of computational physicists and experimentalists seeking to run multiple simulations on an ongoing basis.
The ability to automate repetitive, highly manual tasks to speed up design, build, test, and fix cycles. One method was via automation of repetitive, highly manual tasks:
Bringing structured data from multi-variable simulation parameter analyses together to allow for rapid evaluation and fine tuning.
Starting, monitoring, and tracking multiple unique simulation parameter analyses running concurrently.
Manage large amounts of data quickly and reliably to support rapid iteration; having a strong network capable of managing this data quickly was essential.
Flexibility:
Ability to run highly specialized simulations that are necessary to unlock fusion. Changes to the simulation parameters can result in simulation performance bottlenecks at very different points. A platform that allows for different configurations of computing and memory capacity allows us to rapidly adjust to our varying performance requirements.
Collaboration & Multivariate Analysis:
A new platform needed to support and expand Avalanche’s culture of collaborative iteration. As the company grows, the complexity of simulations continues to increase; having a streamlined, collaborative process to design, build, test and fix simulations was critical to support efficiency.
The ability to orchestrate ensembles of simulation runs (sweeps, Monte Carlos, simultaneous experiments) on an ongoing basis was essential.
Avalanche used Apache Airflow for dispatching and monitoring containerized simulations which were submitted to an Amazon ECS Anywhere cluster consisting of Crusoe's accelerated instances (with Amazon ECS anywhere agent installed) in our remote digital flare mitigated compute environment. The outputs of the simulations were then uploaded to Amazon S3 with metadata stored in Airtable. Avalanche then used Jupyter for visualization and analytics to iterate through the ensembles and inform the next set of experiments. Avalanche runs, on average, 10 jobs a day with each job resulting in 1-100 GB worth of data. On peak days Avalanche has seen 20 jobs at a time.
Avalanche chose to run the compute on Crusoe to have access to and flexibility of top tier compute hardware and climate-aligned computing infrastructure. Avalanche was able to utilize different size accelerator-backed instances with different GPU/CPU and memory ratios. This allows Avalanche to scale and manage costs based on simulation needs – a critical consideration to an early-stage startup!
Furthermore, Crusoe helped with flexibility of options as jobs needed different GPU topologies for specific simulation scenarios. Ultimately the commitment to carbon-capture backed compute was a huge plus for running the compute-bound portion of the architecture on Crusoe Cloud. In summary AWS gives us the software to manage multiple clouds and the system to do what we want, Crusoe gives us the compute power and networking capabilities to run at scale while relying on clean energy.
To learn more about Avalanche Energy, visit avalanche.energy. To learn more about Crusoe cloud compute offering and to get started with running your simulation workloads on Crusoe please visit crusoecloud.com.