Thoughts on Leopold Aschenbrenner's short AGI…

Jun 6, 2024

Can we solve the problem of not having enough internet data to train LLMs, in the timeline proposed?

2 Comments

Jun 7, 2024

Hmm…your three possible solutions ignore the one that seems to me to be the most likely: that we develop better and more efficient ways of ingesting data that exists outside the Internet. The Google Books project of scanning volumes from libraries is one part of this, but there are tons of data and information that are not currently online. My guess is that this is where the near/mid-term data growth comes from.

Expand full comment

Reply (1)

Dave Friedman

Jun 7, 2024

That's a good point. There are a few startups I've seen which are betting on being able to train models that use data not available on the internet.

Expand full comment

Buy the Rumor; Sell the News

Thoughts on Leopold Aschenbrenner's short AGI…