AI inference will be unbundled

Aug 26

The future of inference is bifurcation: cheap edge hustle at one pole, sovereign-scale sanctuaries at the other

5 Comments

Dave we’d love to push our inference needs to a client’s PC, but here we are streaming tokens from the foundational model builders. New insight, we dropped gpt-5 in and it adds about a minute to inference time. Customers like it because the results are better. Also because the extra minute gives the “feeling” of a higher quality response. So not only, are we really better than b4, there is a psychic benefit too for our clients.

Expand full comment

Reply (1)

Dave Friedman

Sep 4

Centralized inference makes sense now. But (1) I don’t think it will stay that way for a class of inference problems and (2) there will always be inference which has to be performed in the cloud due to complexity, size, etc.

Expand full comment

Gossling

Aug 31

What is the implication of this trend for NVIDIA? Will there still be such a high demand for GPU two years from now?

Expand full comment

Reply (1)

Dave Friedman

Aug 31

Inference isn't a zero sum world. Think of it like wifi and cellular data. Wifi bled off a lot of demand from cellular networks, but new demands for cellular data traffic replaced the traffic that wi-fi bled off. Similarly, simple inference tasks will go on to devices like mobile phones, tablets, sensors, etc., but complex, large-scale inference will remain in the cloud, and that complex inference will absorb all excess capacity.

Expand full comment

Granville Martin

Aug 29

Would love to hear your thoughts on interpetability in the context of enterprise adoption in high value use cases.

Expand full comment

Buy the Rumor; Sell the News

AI inference will be unbundled