Understanding Inference and the "Stochastic…

Dec 4, 2024

Everyone uses LLMs but few people understand how they work or what their limitations are. Let's fix that

3 Comments

hmmm, I think you've explained the "stochastic parrot" point of view reasonably well, but not everyone agrees with this interpretation. How confident are you that the stochastic parrot view is correct? What evidence would change your mind?

Expand full comment

Reply (1)

Dave Friedman

Dec 9

Thanks, and good question. I would say I’m confident in the “stochastic parrot” perspective insofar as it effectively describes the mechanism of large language models (LLMs) as probabilistic systems trained to predict text sequences without inherent understanding. However, I also recognize that the “stochastic parrot” view might oversimplify certain emergent behaviors in these systems, such as their apparent ability to generalize across domains or perform tasks that seem to require reasoning.

Evidence that could change my mind would need to demonstrate that LLMs exhibit something akin to genuine comprehension or intentionality rather than sophisticated pattern recognition. For example, if an LLM consistently demonstrated original insights or innovations in ways not reducible to patterns found in its training data, that would challenge the “stochastic parrot” framing. Similarly, if researchers identified mechanisms within LLMs analogous to human cognitive processes (beyond statistical pattern-matching), it would suggest a richer explanation is needed.

Expand full comment

Reply (1)

Jonathan Mann

Dec 10

Thank you for your reply Dave.

Have you seen this research?

https://arxiv.org/abs/2411.06198

Personally, my best guess is that the “stochastic parrot” process does often happen, but, at least some of the time, it seems like it would be hard to explain the results with that model alone. I can contrive some examples if the ones in the paper don't satisfy you.

For academic purposes, I work mostly with smaller LLMs, but my intuition is that the pastiche-only view is missing something important (particularly as models get larger). Either way, I do appreciate your perspectives.

Expand full comment

Buy the Rumor; Sell the News

Understanding Inference and the "Stochastic…