Jul 29, 2024

The Future of AI Depends on Building Infrastructure Today

The Future of AI Depends on Building Infrastructure Today

Crusoe Co-founder and CEO, Chase Lochmiller in conversation with Edelman Smithfield EVP & Head of AI Strategy, Chris Donahoe.

We all have a vision for what a future world looks like with AI. Broadly artificial general intelligence will be more pervasive, AI will be incorporated into the daily mechanics of business; conducting analyses, writing code, building products, selling products, optimizing, supporting customers, coordinating across teams and organizations, making strategic decisions. However, a big question mark remains…. how do we get from here to there. The infrastructure needed to support that level of AI adoption is unprecedented in our lifetimes. AI has the potential to be the largest capital investment in infrastructure in human history.

Full Transcript Below:

0:11 Chase, really excited to speak with you this afternoon.

0:16 I think over the past day and a half, we've been putting together different pieces of the puzzle for how you build an enterprise AI strategy, how you deploy it and scale it and make it work, govern it.

0:32 I think what's so interesting to me about the work you and your team are doing in Crusoe is you guys are kind of placing the last piece of the puzzle into that framework, this infrastructure piece.

0:44 Can you walk us through kind of the problem you set out to solve, what you're seeing in the field and and why you think infrastructure considerations need to be need to have a seat at on the table for these kind of conversations?

1:00 Yeah, absolutely.

1:01 So, you know, just starting off kind of introducing Crusoe, you know, we are a sustainably focused vertically integrated AI infrastructure business.

1:10 So building everything from sourcing the energy to building the data centre and operating a high performance managed cloud services platform.

1:19 And I think, you know, when you look at, when you look at AI and its potential to transform the world, I think certainly the demand is very clear.

1:29 And the, and the value creation is very clear in terms of driving mass, massive productivity gains in terms of how we work, how we operate and, and just everywhere across the entire human experience, you know, and, and, and work and productivity stacks.

1:44 So very, very clear value proposition for enterprise, which is driving tremendous demand.

1:50 And you know, because AI is sort of this digitally native experience, I think often times we overlook the actual physical infrastructure impact of AI.

2:01 And, you know, I had this professor at Stanford, Andrew Wing, Many are probably familiar with him.

2:06 You know, the saying that AI is like electricity.

2:08 You can't actually name one thing that electricity is good for.

2:12 It's sort of good for everything, right?

2:13 It's sort of lights the room, it powers cars, it, you know, enables industry.

2:17 There's all these incredible things that electricity does for society and AI will be very similar in terms of its, its, its global and broad impact.

2:27 But I like to sort of comment further on Andrew's comment there, which is that AI is like electricity actually uses a lot of electricity too.

2:34 And, and so, you know, when we think about the infrastructure layer of AI to sort of make that all of this happen at scale, you know what, what we're looking at is the largest infrastructure investment in human history.

2:47 This is like the new deal on steroids, right?

2:49 It's, it's going to, it's going to create massive jobs too.

2:53 I think that's something that's that's often times overlooked in terms of AI being this great job killer.

2:59 It's going to be the greatest catalyst for blue collar workforce like that's ever existed.

3:03 The amount of the sheer number of electricians and welders and painters and plumbers to build out next generation infrastructure is, is completely staggering.

3:13 And, you know, I, I think it's, it's, it's just a, it's a very complex problem to solve and it's, it's trillions of dollars of capital that need to be invested.

3:24 And so this ranges from everything from the obvious things like new, new chips coming from NVIDIA and AMD and Intel and you know, a lot of the interesting startups building new AI accelerators, but it's also new data centres.

3:37 You know, data centre vacancy rates are at an all time low.

3:41 They ended 2023 less than 2% vacancy rates across the data centre industry.

3:46 When you think about what that means, it's, you know, these are tiny crumbs of data centre capacity that are available maybe 500 kilowatts here, you know, a MW there, but no real capacity to enable these bigger clusters that are really in demand for AI.

4:01 And then you know that the final piece really is energy generation infrastructure.

4:06 We are currently pushing up against the energy limits of of what can be utilised for data centres, which is actually driving new investment in new generation to power this wave of AI LED computing infrastructure.

4:23 It's a great overview.

4:25 You mentioned this is going to be one of the biggest investments in human history.

4:31 You also alluded to the demand for chips and GPU's, which right now is taking up a huge amount of that investment and the the pace of innovation with hardware and like with chips is incredibly rapid right now.

4:49 Can you, can you give us any advice for how to keep up with that pace of innovation when we're thinking about things like infrastructure planning, which so often require a lot of lead time?

5:07 Yeah, I, you know, I, I think this is one of the areas that leveraging cloud service partners is a major advantage in that, you know, the CSPS are, are, are great partners that are always, you know, working with the hardware manufacturers and staying current with, you know, what's you know what, what the next generation architectures look like and can really help people grow as, as sort of, you know, advancements in the hardware ecosystem are rapidly to being deployed.

5:36 You know, the complexity of these systems has has also grown exponentially in terms of, you know, it's not a a simple network connection per server.

5:44 You're talking about these very complex high performance networking architectures that involve these, you know, interconnected RDMA fabrics.

5:55 So RDMA is remote direct memory access where you're basically connecting all of the GPUs into a shared cluster and a non blocking fabric so that you can basically share data across the GPUs in a very high performance and high bandwidth capacity.

6:10 And you know, basically staying current with like, you know, all of the different components.

6:15 It's not just the GPU, it's the, it's the Nic, it's the switch, it's the transceiver and the optics, like all of these different elements, you know, they're very, very complex.

6:26 And, and so I, I think this is where, you know, for, for a lot of big enterprises leaning on cloud service partners that are, that are very focused around AI, is, is a great strategy to adopting AI infrastructure solutions.

6:41 Do do you have a point of view on whether folks should self host AI tools or host those tools in the cloud or should they build like purpose built clouds or.

6:55 Yeah, I, I mean, I think my answer before kind of alludes to how I feel about it.

6:59 But yeah, I, I, I really do think that just with the, with the exponential growth and complexity of building clusters of compute infrastructure, because again, it's not, it's not just about the GPU, it's not even just about the server, it's about the cluster of compute.

7:14 And, and you know, when you're, when you're talking about interconnecting these things in a, in a very high performance way that, you know, enables you to unlock the potential of what the hardware's capable of, It's really helpful to have cloud services partners like Crusoe that can really help, help help unlock the potential of the hardware.

7:37 The, you know, one of the, the physical manifestations of the AI revolution has been the proliferation of data centres.

7:47 I live in Washington, DC.

7:49 Right outside of Washington is Leesburg, VA.

7:52 There are a gazillion data centres there and it seems like they're building more every day.

7:57 It is really something to behold.

8:00 So as a is kind of turbocharged that demand for data centres, how how big do these data centres ultimately need to be?

8:11 How available do they need to be and how should we think about Co location or kind of proximity to the demand as we build out?

8:23 Yeah, these centres?

8:25 That's a great question.

8:26 And sort of what we're seeing unfold is, is actually a complete platform shift from, you know, the way computing was done, you know, during this web application era to the way AI computing is done and structured.

8:43 You know, they're very different use cases.

8:44 They're very different applications that require very different infrastructure needs.

8:49 And so on the one hand, what you're seeing is a massive increase in size of total power capacity, but the densities are much, you know, you're seeing way higher power densities which result in basically smaller square footage footprint per MW of deployed capacity.

9:09 But with all that being said, I, I, I think it's worth kind of like reciting some numbers here.

9:13 So, you know, a traditional data centre rack is anywhere from, you know, 7 to 15 kilowatts per rack.

9:21 When you look at next generation hardware like the NVIDIA NVL 72 GB 200 configuration, you know, that requires 120 kilowatts per rack.

9:32 And, and we're looking at even higher power densities for next generation hardware.

9:36 So you're looking at like a 10X growth in, in power density per rack.

9:40 This has introduced other complexities in terms of managing the heat and managing the cooling of the system, right?

9:47 Because, you know, you just have a lot more power in a, in a, in a tighter space.

9:52 You need to, you need to get rid of that heat in, in, in some way, shape or form.

9:56 So these next generation hardware architectures typically require direct to chip liquid cooling.

10:03 So you have to run a, a cold water loop through the facility and, and whether or not you're using a freshwater supply, which can sort of create a massive environmental impact in a different way aside from energy or a closed loop system, which is kind of what, what Crusoe favours to, you know, you basically chill the water in this, in this, in this, in this cold chilled water loop.

10:25 And then it sort of feeds back into the system to, to cool the chips, you know, is another big element to sort of think through in the overall data centre design.

10:34 And, and then the other big piece is because people are very focused on the cluster, right?

10:39 So what you care about for AI is actually the local network.

10:44 So being able to get from one GPU on your network to another GPU on your network in a very tightly networked and tightly coupled capacity.

10:54 And you care less about the latency it takes to get to the data centre.

10:59 So this has resulted in and opened up new geographies for where we can actually build a lot of these next generation AI data centres.

11:07 So Wall, what's an example?

11:08 So Northern Virginia has sort of been the, the data centre corridor, right?

11:12 You know, Ashburn and you know, all these, all these towns have have sort of experienced this, this boom in, you know, the data centre industry over the last 20 years.

11:22 But you know, an area that we're building, you know, we take a very energy first approach to building AI data centres is an area like West TX.

11:30 West TX is an area that, you know, there's not a lot of industry there today, but it's a place where you can in a very low cost and low impact way, develop renewable energy resources.

11:41 There's, there's very, there's tremendous amounts of, of low cost wind and solar available in a market like West TX that, you know, enables you to power AI infrastructure in a way that's, you know, good for the world and good for the environment, as well as good for your pocketbook and, and sort of drives down your overall costs.

12:00 So I want, I want to stay on this topic of energy intensity, energy consumption for a minute.

12:09 Certainly we we've seen headlines and news about the big tech companies and the energy demands and consumption that they are are employing in the business and what those growth rates are like.

12:19 But I want to we have a lot of other industries represented in the room here.

12:25 Do do you have a view on kind of when we look across various industries which might be more exposed to this bottleneck or constraint or in some cases reputational risk as it relates to that industry's use of, of energy as it relates to AI?

12:48 Sure.

12:48 So, you know, I think often times it's a bit overlooked in terms of people want to deploy an AI solution.

12:54 Again, it's this inherently digital experience, but it has a massive energy impact behind the curtains.

13:00 I think I saw the other day that when you post a photo to Instagram, you turn on a light bulb for eternity.

13:07 Yeah.

13:08 I mean, that's a good, that's a good, you know, random analogy that just got my attention.

13:15 Yeah, no, no, it's, it's great.

13:16 And you know, I think that's, that's probably the, the point of it.

13:19 But, you know, I, I, I think what, what you're seeing, you know, with this AI computing paradigm is this mass convergence between a, or between energy and, and computing.

13:31 And you know, if you look, if you reflect back 20 years ago, a large data centre was maybe 10 megawatts.

13:36 You can sort of fit A10 MW load anywhere on the grid.

13:40 I mean, you can sort of like find a utility.

13:42 It's a big load, but it's not, you know, overwhelming.

13:45 You know, today people are talking about, you know, there's, there's a number of people we're talking to about building GW scale data centres.

13:51 You know, there's, there's Sam Altman was very public about trying to build a 10 GW data centre.

13:55 These, these are such crazy scale deployments of infrastructure that if you're not thinking about the energy required to power that you're not going to be able to build it in the 1st place, let alone build it and also meet your sustainability goals.

14:10 So, you know, Crusoe, like our core mission as a business is aligning the future of computing with the future of the climate.

14:16 So how do we actually unlock all of this incredible innovative potential taking place from AI and how do we enable people to benefit it, benefit from it without accelerating a climate crisis?

14:28 And that really comes down to the way in which we approach infrastructure development by taking this energy first approach, partnering with wind, solar, geothermal, hydro, nuclear, gas generation, with carbon capture and sequestration, you know, just unique energy solutions.

14:44 And we actually take the computing to the areas where we actually have low cost, clean and abundant energy resources to power them effectively.

14:52 This sort of, you know, solution enables people to benefit in two different ways.

14:57 One, they're able to sort of get a, you know, meet their sustainability targets.

15:02 And I think many larger enterprises have big ESG goals that have been sort of set up for themselves.

15:08 And, you know, AI, one of the potential risk to it is that if you play out large scale adoption, it can sort of cause people to not meet those goals.

15:18 And, and Crusoe's goals, you know, is really to, to enable folks to sort of have their cake and eat it too, in terms of being able to provide high performance, you know, low cost infrastructure that is sustainably powered.

15:32 So I think we'd all love to have our cake and eat it too.

15:37 And I think the work you guys are doing, what you're enabling is a really useful tool for a lot of people in the room.

15:46 But what else can the rest of us be doing to sort of support the growth of electricity availability and the the amount of power availability that we're we are going to need to power the workloads and the growth rate for those workloads that we're all looking and planning and anticipating for over the next many years?

16:16 Yeah, I mean, I guess, you know, I come back to this notion of like, you know, with data centre vacancy rates at an all time low, any way you shape it, whether you're, you know, the biggest AI bowl or you know, a bear on the space.

16:30 We're we're at capacity today, which means any net new capacity we're bringing on and there's a lot of it in the queue.

16:36 There's a lot of it planned is going to require a lot of net new energy generation.

16:41 I think this is an incredible moment in history for us as an industry to actually be able to shape what the future energy infrastructure looks like to power the future of computing.

16:52 And you know, in a lot of ways this can actually help accelerate a lot of next generation energy technologies that have applications that, you know, go well beyond just data centres and computing infrastructure.

17:04 So are are are AI companies or AI first companies becoming infrastructure companies whether they realise it or not?

17:17 Yeah.

17:17 I mean, there's no way to absolutely.

17:21 I mean, I mean, there's no way to there's no way to basically do AI at any sort of meaningful scale without, without significant amounts of energy requirements.

17:30 And so you really do have to be thoughtful about, you know, how you're sourcing that energy.

17:35 My only point here is I think there's a massive opportunity because cleaner solutions are becoming cheaper.

17:40 And if we can really lean into that as a society and recognise that we don't need every AI data centre to be in Northern Virginia in your backyard, right?

17:50 We can actually have those data centres many places across the world give you another great example of a great area that Crusoe's building in is Iceland, right.

17:58 Iceland is sort of this geological phenomenon in terms of it's, it's sort of low cost geothermal and hydropower that's that's present on this very small country.

18:07 It's only 300,000 people.

18:10 And you know, the aluminium smelters really figured this out.

18:14 And you know, aluminium smelting is a, is a notoriously energy intensive process.

18:18 And what they do is they actually ship raw aluminium ore across the ocean to Iceland.

18:23 They run the smelters there, the the very energy intensive process and then they ship the finished products out.

18:28 Training a large language model is just a way better manifestation of the same trade, right?

18:32 Moving data to Iceland is very low cost, cheap and effective.

18:36 You can train a big large language model powered by ultra clean, low cost geothermal and hydropower and you can sort of ship the finished products, you know, whether it's, you know, training or inference out to, you know, where you need them to be.

18:50 So, you know, I think there's great opportunities in terms of how we think about this convergence of energy and compute infrastructure.

18:57 It's a, it's a really, really neat example.

19:00 And I love seeing small, unique countries like Iceland finding a way to really provide value in this evolutionary period.

19:11 I want to, I want to wrap this up here by tying together a couple thoughts or ideas or points that keep coming up over the past 24 hours or so.

19:25 One that seems to be emphasised in every conversation is that your AI strategy must be intimately linked.

19:36 It is intimately linked to the business and enterprise strategy that you are already executing.

19:43 And I, I would love your thoughts and conclusion for thinking about, you know, how should we think about the risk of our AI strategy and our approach to infrastructure as it relates to the AI we're deploying?

20:02 How can we think about infrastructure helping to align the AI strategy to the business strategy and not upending some of those larger?

20:13 Enterprise goals that folks may have, sure.

20:19 So I guess with anything, right, there's always a risk and there's an opportunity.

20:26 And, you know, I think there's such an arms race going on right now in terms of getting more GP, us getting, you know, more capacity, getting more data centre space online, that it's sort of left people in a position where a lot of them have sort of pushed their sustainability goals aside and said we'll revisit those later.

20:44 Yeah, we, we've seen that a lot.

20:46 We've seen, we've seen that a lot.

20:47 I mean, Microsoft is sort of famous for saying we're going to be a net negative company.

20:51 And then they said, well, maybe we might take a few more years to, to, to make that happen.

20:56 But you know, this AI thing's like too important right now for us to be worried about that now.

21:02 You know, I think my point in all of this is that there is an opportunity for us as a society and for us as an industry to actually, you know, thread the needle of actually meeting both where it's like we can actually catalyse new development for new clean energy to be developed to power this infrastructure, while also enabling people's AI strategy to work at scale, work in a very, very high performance capacity and, you know, meet, meet the modern AI productivity needs that you know, are are being demanded by, by the industry at large.

21:39 Thank you, Chase, thank you for helping fill in kind of this last piece of the puzzle.

21:44 Thank you for making me smarter and, and the rest of us appreciate you.

21:47 Thank you.


Liked what you just read? Share it

Relevant Articles

View all