Sep 04, 2024

Partnering to Enable the Next Generation of AI, Real time Multimodal Intelligence with Cartesia

Partnering to Enable the Next Generation of AI, Real time Multimodal Intelligence with Cartesia

Crusoe Cloud customer Cartesia is on a mission to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. This intelligence can drive human-like experiences in a fast, multimodal, and compute-efficient way, enabled by their pioneering work on state space models (SSMs). At Crusoe, our goal is to not only help them be successful today, but to enable the future of their innovation.

Cartesia recently released their state-of-the-art text-to-speech model Sonic. Sonic creates high-quality, lifelike speech for any voice with a latency of sub-120 ms—the fastest for a model of this class. Cartesia built and optimized its own SSM inference stack to serve Sonic with low latency and high throughput at scale without impacting quality. Sonic offers multilingual support and features a unique voice ecosystem, where users can generate audio using voices from a diverse library, including applications such as customer support, gaming, entertainment, and content creation. Users can also customize voices with the native voice design studio with support for instant cloning and voice design. Sonic has been adopted across a wide array of verticals and will continue to power use cases for fast, reliable text-to-speech.

Cartesia’s platform has enabled developers to build real-time multimodal AI systems, most recently with their work in bringing their models to the edge. This effort allows users to run Cartesia’s models, including Sonic and other pretrained language models, directly on users’ devices. Cartesia’s work on efficient ML models will open the doors for creating interactive AI experiences that run locally on any device in a fast, secure, and personalized way. 

As an IaaS provider, Crusoe was able to support the training of Cartesia’s model on a cluster with H100 GPUs. Cartesia chose Crusoe for their 1) optimal price/performance offering; 2) responsive support team to ensure success; and 3) commitment to reliability at scale. To date, Cartesia has expanded its partnership with Crusoe multiple times, tripling in size in the same IB fabric and expanded storage instances. Furthermore, in partnering with Crusoe, Cartesia has been able to customize the offering they need to be successful. For example, while Cartesia was interested in SLURM, at the time, Crusoe did not have the in-house expertise. Crusoe was able to bring on an SME to develop a SLURM cluster to further support not only Cartesia but other customers in the future. 

As the world of AI offerings, use cases, hardware and platforms continue to evolve at lightning speed, relationships like this one; rooted in nimbleness, high-performance and customer success are crucial to an ever-changing landscape. So much of the competitive advantage of AI models depends on time to market and Crusoe was thrilled to support Cartesia with this effort. Looking ahead, Cartesia will continue to push boundaries in real time multimodal intelligence and Crusoe will continue to be a great partner in enabling the future of innovation. 

About Crusoe

Crusoe is a vertically-integrated AI infrastructure company that takes an “energy first” approach to engineering, building and scaling our solutions including our commercially available purpose-built AI cloud; Crusoe Cloud. By unlocking stranded sources of energy and building infrastructure to power our AI cloud, Crusoe is creating the building blocks of future AI innovation from the ground up. 

About Cartesia

At Cartesia, we are pioneering the model architectures that will make it possible to build the next generation of AI: ubiquitous, interactive intelligence that runs on every device wherever you are.  Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a fundamental new primitive for training large-scale foundation models. Over the past four years, we’ve built the theory behind SSMs and scaled it up to achieve state-of-the-art results in modalities as diverse as text, audio, video, images, and time-series data.


Liked what you just read? Share it

Relevant Articles

View all