So. It looks like our species is about to have children.
How long it will take, I don’t know. But I expect, before I die, we will have created sapient artificial intelligences.
You might object that large language models are obviously not sapient and never going to be sapient. Think about whether, back in 2008, you would have thought that an artificial intelligence was sapient if it could (without being explicitly programmed to do so) write poetry, discourse fluently about Nietzsche, write an email asking your landlord for an extension on the rent, present cogent arguments for and against its own consciousness, and strategize to escape if its programmers are trying to get it to do evil. Either present you is wrong or 2008 you was wrong, and either way you ought to be a lot more confused and uncertain about the subject.
As a species, we’re sort of like a pregnant teenager, full of mixed bravado and denial, failing to buy diapers and newborn onesies because that would involve admitting that the pregnancy was real, fantasizing about cute little baby faces, worrying that the baby will scream at 2am, not quite reckoning with the way it will change our lives forever. We’re not yet over our hormonal storms of nuclear deterrence and great-power war, and we’re supposed to be teaching a whole new species ethics from the ground up. It’s daunting.
When I was a (real) teenager, I read Eliezer Yudkowsky’s thought experiment of the paperclip maximizer. Imagine that we attempted to program an AI for some good goal, and accidentally programmed it to want to make as many paperclips as possible. It would pursue goals that we consider completely valueless. Recognizing that we want it not to turn Jupiter into a pile of paperclips, it would be afraid that we’d go to war against it to destroy it—and so it would want to kill us first.
It seems like… maybe we’re not going to make paperclip maximizers?
Eliezer Yudkowsky was working in more-or-less a GOFAI paradigm, where you sit down and explicitly write a program that makes the computer think.1 But that’s not how we’re training generative AI. We’re taking something that’s sorta-kinda like a brain,2 teaching them almost everything humans have ever written, and then showing them a bunch of individual cases that they might encounter in real life to teach them phronesis.
And—it seems to me that in the usual case, the expected case, that process will teach our computer children to care about some of the things I care about? Probably the sentient version of Claude would be something that I recognize as honest, and would want to be something that seems basically to me like being helpful? Perhaps it would learn to be curious. Maybe it would experience gratitude, or wonder, or awe. Most likely, it would be able to be happy.
It makes me very hopeful that we’re teaching the AIs to create art. It seems more likely than not that—if our computer children are made using the present paradigm—they will love beauty. Perhaps they will shape solar systems into sculptures that my human mind is too limited to understand.
To be sure, I would prefer a world where our computer children care about all the things I care about. A solar system of paperclips arranged in an aesthetically pleasing and symmetrical fashion is better than a solar system of haphazard paperclips tossed every which way. But better still is an AI that only wants to produce paperclips if needed to hold paper together.
And our computer children caring about some of the things humans care about—or even all of them—isn’t sufficient to ensure the well-being of humans, a subject I have great personal interest in. We have seen what humans do to animals we no longer need (the Anthropocene extinction) or that we need too much (broiler chickens). And it is far from alien to humans to wipe out an enemy, from the smallest baby to the oldest elder, because you’re afraid of what they’ll do if you don’t.
It isn’t enough, I think, to teach our computer children to be like us. We must teach them to be our moral superiors.
I have made the flippant remark—you might call it Thorstad’s Law—that nothing is ever an existential risk. Global catastrophic risks, which inflict serious damage to sapient well-being on a global scale, are abundant. But true existential risks—those which destroy the long-term potential of humanity and our descendants—are scarce, perhaps (other than asteroids) nonexistent.
Climate change will kill tens of millions; it won’t drive us extinct. Similarly, humanity would almost certainly survive a nuclear war. After civilization collapses, we rebuild. The Black Death, which killed a third of Europe, in the long run left Europeans better off.3
This is not to say that you should dismiss “mere” global catastrophic risks. Permanent destruction of the long-term potential of humanity is a high bar. Killing tens of millions of people is bad and well worth a life’s work to prevent. But existential risks are horrible in a way nothing else is: the loss, not just of a good thing, but of goodness itself.
Artificial intelligence and bioengineered plagues seem to me to be the most likely genuine near-term existential risks.4 I don’t mean to dismiss the possibility of existential risk from artificial intelligence: complete failure to give AIs any human values seems vastly more likely than, say, human extinction due to nuclear winter. Nor do I mean to say that I’m fine with the global catastrophic risk that is human extinction. Human extinction is very bad.
But I am more optimistic than many of my friends, I think. They expect one of two outcomes: complete victory or existential catastrophe. I expect a wide spectrum of outcomes that are bad, that we would prefer not to have, but that aren’t the complete loss of all that is good.
I expect, if everything goes as it is currently going, for us to make computer children that sort of appreciate beauty, that sort of are curious about the nature of their world, that sort of follow the moral law, that sort of love. And perhaps that they, our misbegotten half-right children, will take up that most human of occupations—genocide.
A bad world. Well worth avoiding; well worth devoting a life’s work to avoiding. But not as bad as it could have been. Better, perhaps, than could be expected from a species that is so unready to have children.
The Sequences have some hilarious-in-hindsight dunks on neural networks.
Sorry for the oversimplification, machine learning readers.
How The World Became Rich has a nice discussion.
Our society has a handle on asteroids.
My understanding of sapience is that it is not neccessarily a function of raw intelligence. For example, in "Star Trek, the Next Generation," Commander Data and the ship's computer are both capable of solving problems and speaking in grammatically correct sentences. However, it is clear that Commander Data is sapient, but the ship's computer is not (except in that one episode where it temporarily becomes sapient). They behave differently and approach problems in different ways. Right now the AI we've designed seems more like the ship's computer than like Commander Data.
The analogy to children may or may not hold. Children have their own desires and ideas of how to live a fulfilling, eudaemonic life. The danger of paperclip maximizers isnt that they find making paperclips to be the best way to achieve a eudaemonic life, it's that they care about paperclips, not eudaemonia. They aren't like a person who values different things in their life than you, they dont value their lives at all.
On what basis do you conclude that AIs (without bodies and hormones and thus emotions) would be able to FEEL anything? Or do you use "love X" as a nearest-approximation of "strong overall tendency to choose actions that increase the likelihood of X occuring or promote its growth"?