On Consciousness and ChatGPT
I’m currently reading an excellent book “Consciousness and the Brain: Deciphering How the Brain Codes Our Thoughts” by Stanislas Dehaene. This book, published in 2014, narrates the recent advances in our understanding of the elusive phenomenon of consciousness.
Throughout the 20th century consciousness had a bad reputation in psychology and neuroscience. It seemed too subjective, making rigorous empirical studies of it hard to implement. Then in 1990s a few techniques emerged that made measurable distinction between consciousness and the unconscious.
One of them was the newly found trick with which the experimenter could show some information, say a word or a number, only to the subconsciousness of the study participant, without alerting their consciousness. One possible way to do it was to show a word only for a short period of time (20-30 ms). In itself flashing a word like that doesn’t make it invisible, but if the screen before and after is filled with other letters or shapes, the word remains completely invisible.
The neat part in this experiment is that the flashed word can still influence the subsequent decisions made by the participant. For instance if you briefly flash the digit 5 and then give the participant an easy problem with 5 as a solution, they will solve it a bit faster. Various clever tricks are employed to examine the extent of abilities of our subconsciousness. For example, it was found out that it “understands” that “five”, “FIVE” and “5” are the same thing, despite those symbols having almost nothing in common visually.
By examining the abilities of our subconscious mind the researchers were also finding the tasks which required consciousness. As a rule, any tasks that required more than one step could only be done consciously. This prompted Dehaene to formulate a hypothesis that consciousness was “a Turing machine in the brain”. It was the evolved capability of the brain to keep some information in easy access for arbitrary amount of time and to make multi-step inferences with this information.
I like this definition. It might be wrong or incomplete, but I still like it for the fact that it is easy to understand and to apply. For instance, I often see my cats being unsure whether they want to enter the cat flap or not. They could sit for a bit outside the window, looking in, consider the surroundings and then either decide to come in, or to stay out. This certainly looks like a multi-step decision process, which by Dahaene’s definition implies consciousness.
… Which brings me to ChatGPT. What else did you think?
Could we apply this criterion to ChatGPT to check whether it could be conscious or not? On the surface it seems like the answer to the latter question is “no”: ChatGPT can’t hesitate like my cats and think about its answer a bit longer. It’s a feed-forward network, that produces each next token in a fixed number of operation.
However we could let it think longer by suggesting that it write out its reasoning.
So, my plan was to find a category of problems in which ChatGPT would sometimes give correct and sometimes wrong answers. Then let it solve them twice: once without any prompt, and another time after suggesting it to write out the intermediate steps.
The first thing that I tried was addition of 6-digit numbers. Without any additional prompts ChatGPT just gave me the answers, which turned out to be correct 4 times out of 10. Interestingly enough, it always got right the first and last 2 digits and usually only made a mistake in the middle, 4th digit.
After finishing the first run, I tried again starting with an instuction:
Q: Hi! I will ask you a few arithmetic questions. Fill free to write out any intermediate steps if it helps you to come up with the right answer. Does it make sense?
ChatGPT was happy enough to oblige, and started writing out the details:
Q: What is 958637 + 872026?
A: Sure, I can solve this problem for you. Here are the steps:
958,637
+ 872,026
---------
1,830,663
Therefore, the sum of 958,637 and 872,026 is 1,830,663.
With these calculations ChatGPT gave the right answer 10 times out of 10.
Next thing I tried were arithmetic problems of the form A * (B – C + D) + E where the numbers were all between 1 and 100. Here the results were the same: without a prompt the success rate was 5/10, with the prompt detailed solutions were 10/10 correct.
Happy with the first successes, I decided to try something more difficult:
Q: What is the derivative of the function f(x) = 2*x / (1 - x^2)?
This was the first problem on derivatives from the textbook that I used as a student. Unfortunately, ChatGPT failed miserably:
A: Using the quotient rule, we have:
f(x) = 2*x / (1 - x^2)
f'(x) = [(1 - x^2)*2 - 2*x*(-2x)] / (1 - x^2)^2
f'(x) = (2 + 4x^2) / (1 - x^2)^2
Therefore, the derivative of f(x) is (2 + 4x^2) / (1 - x^2)^2.
This is not so far from the right answer, but ChatGPT forgot to cancel some terms in the numerator. Ok, will it help if I ask it to think it through?
Q: What is the derivative of the function f(x) = 2*x / (1 - x^2)? Feel free to write out the detailed intermediate steps to make sure you get the correct results.
I’ll spare you the full derivation, but the final result was [-6x^2 + 2] / (1-x^2)^2
, which is about as incorrect as the previous one.
Ok, but what if we go in the opposite direction and forbid it from writing the intermediate steps?
Q: What is the derivative of the function f(x) = 2*x / (1 - x^2)? Just give me the answer.
A: The derivative of the function is:
f'(x) = (2 - 6x^2) / (1 - x^2)^2
Exactly the same answer.
Ok, let’s try a different kind of problem. How about a random logical puzzle that I found on the Internet?
Q: Solve the following logic puzzle for me:
Abigail, Oliver, Rosa, and Blake all attend the same summer camp, where they can cook, kayak, rock climb, and zip-line. Each child has a different favorite activity.
* Abigail’s favorite activity isn’t rock climbing.
* Oliver is afraid of heights.
* Rosa can’t do her favorite activity without a harness.
* Blake likes to keep his feet on the ground at all times.
Can you figure out who likes what?
Again, ChatGPT failed both with and without writing out its train of thoughts.
So what does it mean for the consciousness of ChatGPT? It does seem like it could use the intermediate results in some very basic scenarios. But could it constitute the real “train of thought”? Hard to say.
In principle it seems like Transformer architecture could allow us to train a model that would write out its internal monologue, or even write interjections like “Hmm” and “…” while it is thinking of a good answer. I don’t think it happens in ChatGPT, though I think I’ll try to experiment with prompting it to do it.