5 Comments
User's avatar
Drew Meister's avatar

Dave we’d love to push our inference needs to a client’s PC, but here we are streaming tokens from the foundational model builders. New insight, we dropped gpt-5 in and it adds about a minute to inference time. Customers like it because the results are better. Also because the extra minute gives the “feeling” of a higher quality response. So not only, are we really better than b4, there is a psychic benefit too for our clients.

Expand full comment
Dave Friedman's avatar

Centralized inference makes sense now. But (1) I don’t think it will stay that way for a class of inference problems and (2) there will always be inference which has to be performed in the cloud due to complexity, size, etc.

Expand full comment
Gossling's avatar

What is the implication of this trend for NVIDIA? Will there still be such a high demand for GPU two years from now?

Expand full comment
Dave Friedman's avatar

Inference isn't a zero sum world. Think of it like wifi and cellular data. Wifi bled off a lot of demand from cellular networks, but new demands for cellular data traffic replaced the traffic that wi-fi bled off. Similarly, simple inference tasks will go on to devices like mobile phones, tablets, sensors, etc., but complex, large-scale inference will remain in the cloud, and that complex inference will absorb all excess capacity.

Expand full comment
Granville Martin's avatar

Would love to hear your thoughts on interpetability in the context of enterprise adoption in high value use cases.

Expand full comment