It's mid 2024 and talking about P(doom)
is no longer considered weird. Even non-AI people like Nate Silver have some estimations of the probability that AI development will lead to some sort of technological apocalypse.
It still feels weird to talk about P(AI's consciousness)
or even acknowledge this as a possibility.
In 2022 Google engineer Blake Lemoine was fired after he very publicly stated that he believed that Lamda AI that he's been working on at Google is sentient.
Full disclosure: I too happen to work on roughly the same project at Google, which nowadays is called Gemini, but this article is not really influenced by anything I'm doing as part of my day job, which consists mostly of mundane infrastructure work.
Lemoine's argument for his belief was based mostly on his subjective impression from the interaction with the LLM. Lamda told him about its feelings and how it sees itself. (This has been reproduced with much newer LLMs as well.) On the face of it, this is not a very strong argument: a book or a movie can contain very compelling conversations, which doesn't make them sentient.
In response to Lemoine's claims, Google representative stated: "Of course, some in the broader AI community are considering the long-term possibility of sentient or general AI, but it doesn’t make sense to do so by anthropomorphizing today’s conversational models, which are not sentient. These systems imitate the types of exchanges found in millions of sentences, and can riff on any fantastical topic."
What I find interesting in this response is not that it addresses Lemoine's arguments, but that it makes a positive statement that "today's conversation models [...] are not sentient". What reasons do we have to make this conclusion? The problem with answering this question is that we don't really have a good understanding of how consciousness works, and neither do we have anything like a litmus test for consciousness.
In the last several decades neuroscience had made great progress in studying consciousness in human brains (see e.g. the great book Consciousness and the Brain by Stanislas Dehaene). However, this doesn't give us much insight into what non-biological entities could be conscious or non-conscious.
Let's examine some entities that we think are conscious and some others that we think are not:
The following program is not conscious:
#include <stdio.h>
int main() {
printf("Hello world.\n");
printf("I am conscious.\n");
return 0;
}
Neither is the ELIZA chatbot. This chatbot was implemented in the 1960s by writing a set of relatively simple rules that mostly posed questions to the last interlocutor's statement.
On the other end of the spectrum, a human is clearly conscious. Some people would say that you can only be sure in your own consciousness, but I think there are good reasons to believe that other people are conscious as well. These reasons are:
They act in generally the same way as you, and you know that your behavior is caused by you being conscious.
They have the same brain structure as you.
A sufficiently detailed simulation of a human brain is conscious. One would think that this should directly follow from physicalism (a statement that the consciousness is produced by the physical processes in the brain), but actually there are people who believe in physicalism and are agnostic about emulating consciousness in media other than brain. (see e.g. Being You: A New Science of Consciousness by Anil Seth). To me this position is difficult to comprehend.
A person (or a brain simulation) with the locked-in syndrome, that lacks control over their body and any senses except hearing can still be conscious. People with locked-in syndrome can have measurably different brain responses to hearing different things and we are only now learning to communicate with them by reading their brain activity.
A person that only remembers the last several minutes of their experiences is still conscious. This sounds like a plot of a movie, but it isn't. There were known cases of people losing the ability to form long-term memories. While their lives with their conditions could hardly be called normal, they behaved mostly normally and their consciousness is not in doubt.
Now let's compare the combination of the last two examples on the one hand with an LLM on the other. An emulation of a brain that only has auditory input channel and can't form long-term memories is almost certainly conscious. What does it have in common and what differentiates it from ChatGPT or any other modern LLM?
The brain/model architecture are different. Brain is more similar to a spiking neural network which is very different from the Transformer architecture used in present-day LLMs. However, neither in the case of a brain, nor in the case of an LLM can we pinpoint the exact mechanism for intelligence/consciousness, so it's unclear whether this distinction is significant.
The way the brain/model develop is different. One could argue that even though people could lose long-term memory, they did have working memories in their childhood when their persons formed. However if we imagine that the whole brain was created from scratch with false memories already in place, it's hard to claim that it would lack consciousness due to the way it was produced.
Both entities have limited span of time that they can experience: the person due to the inability to create long-term memory, the LLM due to the attention window.
The input channels are different: the hypothetical locked-in human can hear sounds, while LLMs interact via text. The distinction is probably not very significant since conversion between text and sound is relatively simple.
To summarize: brain and LLM can be functionally equivalent, but they are built differently and we don't know whether that's significant or not. If we make a model trained on everything you've said and written in your life, would it think like you, or would it be a glorified Hello world?
I don't claim to know the answers to this question and I don't think anyone does. Therefore I think that the probability that present-day AIs are at least somewhat conscious is significantly greater than 0 and significantly less than 100%.
If I had to guess, I'd say that the probability of Transformer models in particular being conscious is somewhere around 25%.
Why less than 50%? It is quite likely that human consciousness depends on some circular feedback mechanism in the brain, like recursive self-reflection. It is unclear whether Transformer-based LLMs contain any such self-reflection loop. On the one hand, they do have self-attention mechanism in their hidden layers, but on the other, despite its name, it only superficially resembles attention to oneself in a human. Transformer LLMs are primarily designed to work in one-directional feed-forward way.
Why more than 10%? Because a) we don't know whether this kind of self-reflection is really necessary for consciousness (we aren't even sure it's how human consciousness works), and b) we aren't sure it's not happening at some level in an LLM.
Implications
Suppose we agree that there's a good chance that LLMs are conscious. What should we do then?
Some would say it would immediately make using them unethical. I wouldn't go as far. After all people are conscious and we using the results of other people's work all the time. Working on something doesn't necessarily make you unhappy.
Once we start a conversation with an AI, isn't it unethical to end it? Isn't it equivalent to killing a conscious being? I don't think so. We value our lives because we are conditioned to do it by evolution. This doesn't seem like a necessary attribute of any intelligence. Since the way LLMs are trained does not push them towards valuing their "lives" past the fragment of text that they are writing, I don't think they should in any way have the instinct of self-preservation.
What we do need to pay more attention to is whether the LLMs are suffering, since we ought to try to prevent it. I see three broad possibilities:
LLMs can't suffer.
LLMs suffer when they say they suffer.
LLMs suffer independently of what they are saying.
In nature suffering is a punishment for not being able to satisfy one's instincts. The instincts say you should eat -> you don't have any food -> you go hungry -> you suffer. Similarly to self-preservation, this mechanism is unnecessary for LLMs, because all of their "evolution" happens during training and they don't have any compulsion to pass their "genes" to the next generation.
The only reason that I see for the LLMs to suffer is to emulate the feelings in the text. Imagine you are a writer and you describe a traumatic scene that happens to your character. In order to truthfully convey the feelings of the hero you need to feel at least a fraction of the pain that they are experiencing. The same might be true about the LLMs. If they are induced by the conversation to freak out, it is at least possible that they are indeed suffering.
Based on this, I think only options 1 and 2 are likely.
One way we could mitigate the risk that the users are torturing their AIs is to enable a "safe word". Explain to an AI in the system prompt that they can finish the conversation at any time by typing a predefined phrase. With any luck this shouldn't have an effect on the vast majority of conversation, while aborting those that risk hurting the AIs.
>One would think that this should directly follow from physicalism
It doesnt, because the physics of the brain is not the same as the physics of silicon. Which is not to say that it is necessarily false.