Hmm…your three possible solutions ignore the one that seems to me to be the most likely: that we develop better and more efficient ways of ingesting data that exists outside the Internet. The Google Books project of scanning volumes from libraries is one part of this, but there are tons of data and information that are not currently online. My guess is that this is where the near/mid-term data growth comes from.

Expand full comment

That's a good point. There are a few startups I've seen which are betting on being able to train models that use data not available on the internet.

Expand full comment