13 Comments
User's avatar
Jorg's avatar

I love this Explainer, it clarifies some things for me, and it brought back a beautiful memory.

In the early 1990s ,a dear friend was running his family business of a local county newspaper and Sales Sheet. He wanted to move to using high-end Macs. At the time I believe they already ran around $5K each. Maybe more.

He had an old lead slug Linotype sitting where it always did, in a back room. I had watched him "sling hot lead type" on it back in the 1960s. As a side note, watching him helped me to learn to read backwards and upside down.

He found, or someone found for him, a buyer in Brazil. He got $50K for it and they paid to have it dismantled (with some help from him) and shipped.

Why did they want to pay that much for an "obsolete" hunk of machinery? Well, it was going to a largish city in the Brazilian interior, where the electrical power was not always very available. If they used more modern methods a power interruption tended to mean they lost everything they were going to print and had to start over. With the Linotype, if the power shut down, when it came back on the Linotype resumed right where it left off.

Win-win for both parties. And of course the Linotype had been fully depreciated back in the 60s.

And the depreciated value of a thing seriously depends on what it is useful for, for how long, right?

Dave Friedman's avatar

Yeah, this is exactly like the GPU story. An H100 that’s worthless for frontier training still has enormous value for inference, and below that, for long-tail compute workloads that don’t need anything close to the latest generation.

Juan Miguel Agudo's avatar

Complex systems are always oversimplified and the public (we) usually only get to see things through that distorted angle.

Thanks for the detailed analysis!

Also I guess not all hyper scalers are equal, right?

Some may get to profit on the long tail and others won’t; simply because they don’t offer the services that that piece of hardware, on that specific spam of it’s useful life provides.

Dave Friedman's avatar

Yes, this is a good point. A hyperscaler like Google or Amazon has a much deeper long-tail demand pool to absorb older chips than a pure play GPU cloud like CoreWeave or Lambda. An H100 in Google has a longer economic life than one inside CoreWeave. That's another dimension that the single curve model misses entirely.

Juan Miguel Agudo's avatar

Thanks for your comment!

kledis's avatar
11hEdited

Is there enough of a market for the long curve?, if inference obsolescence is overflowing that market in 4 years, there will be not much value left.

Jacky Li's avatar

Great, nuanced explanation on chip depreciation. Despite the nuance, it’s obvious that a GPU’s useful life is longer than 2 years as Burry suggested.a

Jon Rowlands's avatar

You see the same in semiconductor processes nodes. Older processes remain profitable for a long time, but with different customers. At the extreme, multi-project wafers.

Yummy's avatar

I have to think in four to five years or sooner and especially with hyperscaler competition that GPUs will be hard to improve much… for additional costs …your thoughts?

Dave Friedman's avatar

If GPU generational improvements start plateauing, which is plausible as we push up against physical limits, that actually extends the economic life across all three curves.

Les Barclays's avatar

Really interesting read! The long tail compute obsolescence curve is the most underdiscussed part I've seen so far. Even in my own analysis, I didn't really consider the long tail either. This piece shines a light on what I didn't pick up - and I think that's what made it challenging for me to write as I couldn't help but feel like something was missing in my own work. Thanks for helping me connect the dots.

I can see what you're saying about how this is hard to forecast. Since Nvidia's T4 is used for inference (amongst other workloads), wouldn't the higher efficiency slow down the obsolescence curve, especially towards the long-tail?

Dave Friedman's avatar

Thanks.

As for T4s, it's not so much that the T4's efficiency slows its obsolescence curve. It's that the workloads it serves have low performance thresholds. If you're using it for inference on a small model or running a recommendationengine, it doesn't need to be efficient. Just cheaper than the alternative.

Les Barclays's avatar

Thanks for clarifying!