If Code Interpreter says it's true, who am I to judge?
We are definitely at the "try to break the model" phase. This is a super important phase with any emerging tech I want to actually use -- I have to make sure I fully understand what it CAN'T do, then figure out what I can do with it.
Of course, the things we can't do tend to get lower over time, too!
Great example, and important. Both a reasonable wrong request wording, and a far better request wording. Total # of CA drivers ( over many years) at 576 mln , is less good than total # of CA drivers in 2018. 27 mln is the (more?) right answer. More folks should guess first. I would have guessed 3/5s - 3/4s or up to 4/5s of CA pop of 36 mln.
Smart folk willing to estimate first, then check more carefully if the AI answer is outside the range, seem more likely to avoid wildly wrong answers.