No, ChatGPT hasn't passed the Turing test

ChatGPT can do a lot of things, but it can't always pass for human.

Nov 26, 2023

A common, and simple, description of the Turing test is this: if a human can’t tell whether a machine or another human has written some text, the machine has passed the Turing test.

A lot of people argue that ChatGPT passes the Turing test.

A simple test shows that this isn’t the case. ChatGPT frequently has issues with word play, especially absurdist word play, and it does not respond as one would expect a human to respond to absurdity. Consider the following prompt: If one is the luckiest number but I draw a two can’t I just split the two in half and be doubly lucky?

Its response1 is rather non-human, and obviously so:

The concept of splitting a two to create two ones as a way to increase luck is an interesting and playful thought, but it doesn’t align with the principles of probability or luck in a literal sense. Luck, as commonly understood, is an abstract concept and isn’t quantifiable or manipulable in such a direct manner.
In probability and games of chance, each number or outcome usually has a distinct meaning or value, and altering the physical representation of a number (like splitting a two into two ones) doesn’t change its inherent value or the probability associated with it. For instnace, in a card game, the valud of a card is determined by its face value and suit, not by the physical card itself.
However, if we consider this idea from a philosophical or metaphorical perspective, it can be seen as a creative approach to viewing circumstances. It reflects an attitude of making the best out of what one has, or finding positivity in less-than-ideal situations—akin to “making your own luck.” This mindset aligns with your approach of challenging conventional limits and finding novel solutions to problems. In this way, the idea of splitting a ‘two’ to create double luck can be seen as a metaphor for resourcefulness and optimism.

There’s nothing wrong with this response. It’s just obviously non-human. No human would respond to If one is the luckiest number but I draw a two can’t I just split the two in half and be doubly lucky with the disquisition which ChatGPT offers. Now, it’s true that its response is informed, in part, by my custom instructions. The last two sentences of its response reflect certain of the custom instructions I provide to it. But even in the absence of these custom instructions, its response to these kinds of prompts is still decidely and obviously non-human.

I relate all of this, because I came across an interesting tweet from Melanie Mitchell. She links to a paper, titled Does GPT-4 Pass the Turing Test? The authors conclude, as I do above, that it does not pass the Turing test. Here’s its abstract:

We evaluated GPT-4 in a public online Turing Test. The best-performing GPT-4 prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and GPT-3.5 (14%), but falling short of chance and the baseline set by human participants (63%). Participants’ decisions were based mainly on linguistic style (35%) and socio-emotional traits (27%), supporting the idea that intelligence is not sufficient to pass the Turing Test. Participants’ demographics, including education and familiarity with LLMs, did not predict detection rate, suggesting that even those who understand systems deeply and interact with them frequently may be susceptible to deception. Despite known limitations as a test of intelligence, we argue that the Turing Test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as human could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for hudging humanlikeness.

While I agree that AI models which can reliably pass as human would have widespread societal consequences, we can sleep easily at night knowing that ChatGPT can’t (yet?) do so.

You can see my original dialogue with ChatGPT here.

Buy the Rumor; Sell the News

Discussion about this post