ChatGPT still can't do math
OpenAI claims that ChatGPT has been updated with better math capabilities. Evidence suggests otherwise.
OpenAI announced on Jan 30th, 2023 that ChatGPT has been updated:
This terse announcement doesn’t provide any information about what its “mathematical capabilities” are. So, here’s a mathematical capability to test: calculating the volume of a sphere. If you’ve forgotten this formula from elementary school, it is:
The volume of a sphere equals 4/3 * pi * r^3. In natural language: the volume of a sphere equals four-thirds times pi times the radius cubed.
So consider my prompt: 4/3 * pi * 5^3
.
Here’s its output:
The expression 4/3 * pi * 5^3 can be calculated as follows:
4/3 * pi * 5^3 = 4/3 * pi * 125
= (4 * 125) / 3 * pi
= 500 / 3 * pi
= 500 / 3 * 3.14159265... (approximation of pi)
= 500 * 3.14159265... / 3
= 500 * pi / 3
= 500 * 3.14159265... / 3
= 500 * pi / 3
= 500 * pi / 3
= 500 * pi / 3
= 1600 / 3 * pi
= 1600 / 3 * 3.14159265...
= 1600 * 3.14159265... / 3
= 1600 * pi / 3
= 5103.08 / 3
So the result of 4/3 * pi * 5^3 is approximately equal to 5103.08 / 3.
Intuitively this doesn’t seem right, since 5^3 = 125 and 125 * 4/3 * 3.14 doesn’t get you anywhere close to 5,103.08 / 3.
So, let me rephrase the prompt: 4/3pi * 5^3
. Note that in this prompt, I eliminated the first asterisk, thinking perhaps the way ChatGPT parsed mathematical operations made it think that something other than the multiplication of three numbers was at hand.
Its output to this modified prompt is:
The expression 4/3 * pi * 5^3 can be calculated as follows:
4/3 * pi * 5^3 = 4/3 * pi * 125
= (4 * 125) / 3 * pi
= 500 / 3 * pi
= 500 / 3 * 3.14159265... (approximation of pi)
= 500 * 3.14159265... / 3
= 500 * pi / 3
= 500 * 3.14159265... / 3
= 500 * pi / 3
= 500 * pi / 3
= 500 * pi / 3
= 1600 / 3 * pi
= 1600 / 3 * 3.14159265...
= 1600 * 3.14159265... / 3
= 1600 * pi / 3
= 5103.08 / 3
So the result of 4/3 * pi * 5^3 is approximately equal to 5103.08 / 3.
OK. What if I used parentheses to enclose various numbers in this formula? So I changed the prompt to: (4/3 * pi) * (5^3)
.
It still insisted that 4/3 * pi * 5^3 is “approximately equal to 5103.08 / 3.
As a sanity check I checked this with WolframAlpha:
And I doubled checked my sanity check in Google Sheets:
Here’s Stephen Wolfram himself writing about integrating ChatGPT with his computational knowledge engine:
ChatGPT does great at the “human-like parts”, where there isn’t a precise “right answer”. But when it’s “put on the spot” for something precise, it often falls down. But the whole point here is that there’s a great way to solve this problem—by connecting ChatGPT to Wolfram|Alpha and all its computational knowledge “superpowers”.
Inside Wolfram|Alpha, everything is being turned into computational language, and into precise Wolfram Language code, that at some level has to be “perfect” to be reliably useful. But the crucial point is that ChatGPT doesn’t have to generate this. It can produce its usual natural language, and then Wolfram|Alpha can use its natural language understanding capabilities to translate that natural language into precise Wolfram Language.
In many ways, one might say that ChatGPT never “truly understands” things; it just “knows how to produce stuff that’s useful”. But it’s a different story with Wolfram|Alpha. Because once Wolfram|Alpha has converted something to Wolfram Language, what it’s got is a complete, precise, formal representation, from which one can reliably compute things. Needless to say, there are plenty of things of “human interest” for which we don’t have formal computational representations—though we can still talk about them, albeit it perhaps imprecisely, in natural language. And for these, ChatGPT is on its own, with its very impressive capabilities.
But just like us humans, there are times when ChatGPT needs a more formal and precise “power assist”. But the point is that it doesn’t have to be “formal and precise” in saying what it wants. Because Wolfram|Alpha can communicate with it in what amounts to ChatGPT’s native language—natural language. And Wolfram|Alpha will take care of “adding the formality and precision” when it converts to its native language—Wolfram Language. It’s a very good situation, that I think has great practical potential.
And that potential is not only at the level of typical chatbot or text generation applications. It extends to things like doing data science or other forms of computational work (or programming). In a sense, it’s an immediate way to get the best of both worlds: the human-like world of ChatGPT, and the computationally precise world of Wolfram Language.
Clearly, ChatGPT has a ways to go before it can be relied upon as a calculator. Incidentally, if you ask ChatGPT to write a Python script which calculates the volume of a sphere, it does this flawlessly:
import math
def sphere_volume(radius):
volume = (4/3) * math.pi * (radius ** 3)
return volume
radius = float(input("Enter the radius of the sphere in centimeters: "))
volume = sphere_volume(radius)
print("The volume of the sphere is: ", volume, "cubic centimeters")