In the previous episode, Eliezer2001 is fighting a rearguard action against the truth. Only gradually shifting his beliefs, admitting an increasing probability in a different scenario, but never saying outright, “I was wrong before.” He repairs his strategies as they are challenged, finding new justifications for just the same plan he pursued before.
(Of which it is therefore said: “Beware lest you fight a rearguard retreat against the evidence, grudgingly conceding each foot of ground only when forced, feeling cheated. Surrender to the truth as quickly as you can. Do this the instant you realize what you are resisting; the instant you can see from which quarter the winds of evidence are blowing against you.”)
Memory fades, and I can hardly bear to look back upon those times—no, seriously, I can’t stand reading my old writing. I’ve already been corrected once in my recollections, by those who were present. And so, though I remember the important events, I’m not really sure what order they happened in, let alone what year.
But if I had to pick a moment when my folly broke, I would pick the moment when I first comprehended, in full generality, the notion of an optimization process. That was the point at which I first looked back and said, “I’ve been a fool.”
Previously, in 2002, I’d been writing a bit about the evolutionary psychology of human general intelligence—though at the time, I thought I was writing about AI; at this point I thought I was against anthropomorphic intelligence, but I was still looking to the human brain for inspiration. (The paper in question is “Levels of Organization in General Intelligence,” a requested chapter for the volume Artificial General Intelligence,1 which finally came out in print in 2007.)
So I’d been thinking (and writing) about how natural selection managed to cough up human intelligence; I saw a dichotomy between them, the blindness of natural selection and the lookahead of intelligent foresight, reasoning by simulation versus playing everything out in reality, abstract versus concrete thinking. And yet it was natural selection that created human intelligence, so that our brains, though not our thoughts, are entirely made according to the signature of natural selection.
To this day, this still seems to me like a reasonably shattering insight, and so it drives me up the wall when people lump together natural selection and intelligence-driven processes as “evolutionary.” They really are almost absolutely different in a number of important ways—though there are concepts in common that can be used to describe them, like consequentialism and cross-domain generality.
But that Eliezer2002 is thinking in terms of a dichotomy between evolution and intelligence tells you something about the limits of his vision—like someone who thinks of politics as a dichotomy between conservative and liberal stances, or someone who thinks of fruit as a dichotomy between apples and strawberries.
After the “Levels of Organization” draft was published online, Emil Gilliam pointed out that my view of AI seemed pretty similar to my view of intelligence. Now, of course Eliezer2002 doesn’t espouse building an AI in the image of a human mind; Eliezer2002 knows very well that a human mind is just a hack coughed up by natural selection. But Eliezer2002 has described these levels of organization in human thinking, and he hasn’t proposed using different levels of organization in the AI. Emil Gilliam asks whether I think I might be hewing too close to the human line. I dub the alternative the “Completely Alien Mind Design” and reply that a CAMD is probably too difficult for human engineers to create, even if it’s possible in theory, because we wouldn’t be able to understand something so alien while we were putting it together.
I don’t know if Eliezer2002 invented this reply on his own, or if he read it somewhere else. Needless to say, I’ve heard this excuse plenty of times since then. In reality, what you genuinely understand, you can usually reconfigure in almost any sort of shape, leaving some structural essence inside; but when you don’t understand flight, you suppose that a flying machine needs feathers, because you can’t imagine departing from the analogy of a bird.
So Eliezer2002 is still, in a sense, attached to humanish mind designs—he imagines improving on them, but the human architecture is still in some sense his point of departure.
What is it that finally breaks this attachment?
It’s an embarrassing confession: It came from a science fiction story I was trying to write. (No, you can’t see it; it’s not done.) The story involved a non-cognitive non-evolutionary optimization process, something like an Outcome Pump. Not intelligence, but a cross-temporal physical effect—that is, I was imagining it as a physical effect—that narrowly constrained the space of possible outcomes. (I can’t tell you any more than that; it would be a spoiler, if I ever finished the story. Just see the essay on Outcome Pumps.) It was “just a story,” and so I was free to play with the idea and elaborate it out logically: C was constrained to happen, therefore B (in the past) was constrained to happen, therefore A (which led to B) was constrained to happen.
Drawing a line through one point is generally held to be dangerous. Two points make a dichotomy; you imagine them opposed to one another. But when you’ve got three different points—that’s when you’re forced to wake up and generalize.
Now I had three points: Human intelligence, natural selection, and my fictional plot device.
And so that was the point at which I generalized the notion of an optimization process, of a process that squeezes the future into a narrow region of the possible.
This may seem like an obvious point, if you’ve been following Overcoming Bias this whole time; but if you look at Shane Legg’s collection of 71 definitions of intelligence, you’ll see that “squeezing the future into a constrained region” is a less obvious reply than it seems.
Many of the definitions of “intelligence” by AI researchers do talk about “solving problems” or “achieving goals.” But from the viewpoint of past Eliezers, at least, it is only hindsight that makes this the same thing as “squeezing the future.”
A goal is a mentalistic object; electrons have no goals, and solve no problems either. When a human imagines a goal, they imagine an agent imbued with wanting-ness—it’s still empathic language.
You can espouse the notion that intelligence is about “achieving goals”—and then turn right around and argue about whether some “goals” are better than others—or talk about the wisdom required to judge between goals themselves— or talk about a system deliberately modifying its goals—or talk about the free will needed to choose plans that achieve goals—or talk about an AI realizing that its goals aren’t what the programmers really meant to ask for. If you imagine something that squeezes the future into a narrow region of the possible, like an Outcome Pump, those seemingly sensible statements somehow don’t translate.
So for me at least, seeing through the word “mind” to a physical process that would, just by naturally running, just by obeying the laws of physics, end up squeezing its future into a narrow region, was a naturalistic enlightenment over and above the notion of an agent trying to achieve its goals.
It was like falling out of a deep pit, falling into the ordinary world, strained cognitive tensions relaxing into unforced simplicity, confusion turning to smoke and drifting away. I saw the work performed by intelligence; smart was no longer a property, but an engine. Like a knot in time, echoing the outer part of the universe in the inner part, and thereby steering it. I even saw, in a flash of the same enlightenment, that a mind had to output waste heat in order to obey the laws of thermodynamics.
Previously, Eliezer2001 had talked about Friendly AI as something you should do just to be sure—if you didn’t know whether AI design X was going to be Friendly, then you really ought to go with AI design Y that you did know would be Friendly. But Eliezer2001 didn’t think he knew whether you could actually have a superintelligence that turned its future light cone into paperclips.
Now, though, I could see it—the pulse of the optimization process, sensory information surging in, motor instructions surging out, steering the future. In the middle, the model that linked up possible actions to possible outcomes, and the utility function over the outcomes. Put in the corresponding utility function, and the result would be an optimizer that would steer the future anywhere.
Up until that point, I’d never quite admitted to myself that Eliezer1997’s AI goal system design would definitely, no two ways about it, pointlessly wipe out the human species. Now, however, I looked back, and I could finally see what my old design really did, to the extent it was coherent enough to be talked about. Roughly, it would have converted its future light cone into generic tools—computers without programs to run, stored energy without a use…
… how on Earth had I, the fine and practiced rationalist—how on Earth had I managed to miss something that obvious, for six damned years?
That was the point at which I awoke clear-headed, and remembered; and thought, with a certain amount of embarrassment: I’ve been stupid.