Read Our Final Invention: Artificial Intelligence and the End of the Human Era Hardcover Online
Authors: James Barrat
These AI makers aren’t mad scientists or people any different from you and me—you’ll meet several in this book. But recall the availability bias from chapter 2. When faced with a decision, humans will choose the option that’s recent, dramatic, or otherwise front and center. Annihilation by AI isn’t generally available to AI makers. Not as available as making advances in their field, getting tenure, publishing, getting rich, and so on.
In fact, not many AI makers, in contrast to AI
theorists,
are concerned with building Friendly AI. With one exception, none of the dozen or so AI makers I’ve spoken with are worried enough to work on Friendly AI or any other defensive measure. Maybe the thinkers overestimate the problem, or maybe the
makers’
problem is not knowing what they don’t know. In a much-read online paper, Yudkowsky put it like this:
The human species came into existence through natural selection, which operates through the nonchance retention of chance mutations. One path leading to global catastrophe—to someone pressing the button with a mistaken idea of what the button does—is that Artificial Intelligence comes about through a similar accretion of working algorithms, with the
researchers having no deep understanding of how the combined system works.
[italics mine]
Not knowing how to build a Friendly AI is not deadly, of itself.… It’s the mistaken belief that an AI will be friendly which implies an obvious path to global catastrophe.
Assuming that human-level AIs (AGIs) will be friendly is wrong for a lot of reasons. The assumption becomes even more dangerous after the AGI’s intelligence rockets past ours, and it becomes ASI—artificial superintelligence. So how do you create friendly AI? Or could you impose friendliness on advanced AIs after they’re already built? Yudkowsky has written a book-length online treatise about these questions entitled
Creating Friendly AI: The Analysis and Design of Benevolent Goal Architectures.
Friendly AI is a subject so dense yet important it exasperates its chief proponent himself, who says about it, “it only takes one error for a chain of reasoning to end up in Outer Mongolia.”
Let’s start with a simple definition. Friendly AI is
AI that has a positive rather than a negative impact on mankind.
Friendly AI pursues goals, and it takes action to fulfill those goals. To describe an AI’s success at achieving its goals, theorists use a term from economics: utility. As you might recall from Econ 101, consumers behaving rationally seek to maximize utility by spending their resources in the way that gives them the most satisfaction. Generally speaking, for an AI, satisfaction is gained by achieving goals, and an act that moves it toward achieving its goals has high “utility.”
Values and preferences in addition to goal satisfaction can be packed into an AI’s definition of utility, called its “utility function.” Being friendly to humans is one such value we’d like AIs to have. So that no matter what an AI’s goals—from playing chess to driving cars—preserving human values (and humans themselves) must be an essential part of its code.
Now,
friendly
here doesn’t mean Mister Rogers–friendly, though that wouldn’t hurt. It means that AI should be neither hostile nor ambivalent toward humans,
forever
, no matter what its goals are or how many self-improving iterations it goes through. The AI must have an understanding of our nature so deep that it doesn’t harm us through unintended consequences, like those caused by Asimov’s Three Laws of Robotics. That is, we don’t want an AI that meets our short-term goals—please save us from hunger—with solutions detrimental in the long term—by roasting every chicken on earth—or with solutions to which we’d object—by killing us after our next meal.
As an example of unintended consequences, Oxford University ethicist Nick Bostrom suggests the hypothetical “paper clip maximizer.” In Bostrom’s scenario, a thoughtlessly programmed superintelligence whose programmed goal is to manufacture paper clips does exactly as it is told without regard to human values. It all goes wrong because it sets about “transforming first all of earth and then increasing portions of space into paper clip manufacturing facilities.” Friendly AI would make only as many paper clips as was compatible with human values.
Another tenet of Friendly AI is to avoid dogmatic values. What we consider to be good changes with time, and any AI involved with human well-being will need to stay up to speed. If in its utility function an AI sought to preserve the preferences of most Europeans in 1700 and never upgraded them, in the twenty-first century it might link our happiness and welfare to archaic values like racial inequality and slaveholding, gender inequality, shoes with buckles, and worse. We don’t want to lock specific values into Friendly AI. We want a moving scale that evolves with us.
Yudkowsky has devised a name for the ability to “evolve” norms—Coherent Extrapolated Volition. An AI with CEV could anticipate what we would want. And not only what we would want, but what we would want if we “knew more, thought faster, and were more the people we thought we were.”
CEV would be an oracular feature of friendly AI. It would have to derive from us our values
as if
we were better versions of ourselves, and be democratic about it so that humankind is not tyrannized by the norms of a few.
Does this sound a little starry-eyed? Well, there are good reasons for that. First, I’m giving you a highly summarized account of Friendly AI and CEV, concepts you can read volumes about online. And second, the whole topic of Friendly AI is incomplete and optimistic. It’s unclear whether or not Friendly AI can be expressed in a formal, mathematical sense, and so there may be no way to build it or to integrate it into promising AI architectures. But if we could, what would the future look like?
* * *
Let’s say that sometime, ten to forty years from now, IBM’s SyNAPSE project to reverse engineer the brain has borne fruit. Jump-started in 2008 with a nearly $30 million grant from DARPA, IBM’s system copies the mammalian brain’s basic technique: simultaneously taking in thousands of sources of input, evolving its core processing algorithms, and outputting perception, thought, and action. It started out as a cat-sized brain, but it scaled to human-sized, and then, beyond.
To build it, the researchers of SyNAPSE (Systems of Neuromorphic Adaptive Plastic Scalable Electronics) created a “cognitive computer” made up of thousands of parallel processing computer chips. Taking advantage of developments in nanotechnology, they built chips one square micron in size. Then they arrayed the chips in a carbon sphere the size of a basketball, and suspended it in gallium aluminum alloy, a liquid metal, for maximum conductivity.
The tank holding it, meanwhile, is a powerful wireless router connected to millions of sensors distributed around the planet, and linked to the Internet. These sensors gather input from cameras, microphones, pressure and temperature gauges, robots, and natural systems—deserts, glaciers, lakes, rivers, oceans, and rain forests. SyNAPSE processes the information by automatically learning the features and relationships revealed in the massive amounts of data. Function follows form as neuromorphic, brain-imitating hardware autonomously gives rise to intelligence.
Now SyNAPSE mirrors the human brain’s thirty billion neurons and hundred trillion connecting points, or synapses. And it’s surpassed the brain’s approximately thousand trillion operations per second.
For the first time, the human brain is the
second
-most-complex object in the known universe.
And friendliness? Knowing that “friendliness” had to be a core part of any intelligent system, its makers encoded values and safety into each of SyNAPSE’s millions of chips. It is friendly down to its DNA. Now as the cognitive computer grows more powerful it makes decisions that impact the world—how to handle the AIs of terrorist states, for example, how to divert an approaching asteroid, how to stop the sea level’s rapid rise, how to speed the development of nano-medicines that will cure most diseases.
With its deep understanding of humans SyNAPSE extrapolates with ease what we would choose
if
we were powerful and intelligent enough to take part in these high-level judgments. In the future, we survive the intelligence explosion! In fact, we thrive.
God bless you, Friendly AI!
* * *
Now that most (but not all) AI makers and theorists have recognized Asimov’s Three Laws of Robotics for what they were meant to be—tools for drama, not survival—Friendly AI may be the best concept humans have come up with for planning their survival. But besides not being ready yet, it’s got other big problems.
First, there are too many players in the AGI sweepstakes. Too many organizations in too many countries are working on AGI and AGI-related technologies for them all to agree to mothball their projects until Friendly AI is created, or to include in their code a formal friendliness module, if one could be made. And few are even taking part in the public dialogue about the necessity for Friendly AI.
Some of the AGI contestants include: IBM (with several AGI-related projects), Numenta, AGIRI, Vicarious, Carnegie Mellon’s NELL and ACT-R, SNERG, LIDA, CYC, and Google. At least a dozen more, such as SOAR, Novamente, NARS, AIXItl, and Sentience, are being developed with less certain sources of funding. Hundreds more projects wholly or partially devoted to AGI exist at home and abroad, some cloaked in stealth, and some hidden behind modern-day “iron curtains” of national security in countries such as China and Israel. DARPA publicly funds many AI-related projects, but of course it funds others covertly, too.
My point is that it’s unlikely MIRI will create the first AGI out of the box with friendliness built in. And it’s unlikely that the first AGI’s creators will think hard about issues like friendliness. Still, there is more than one way to block
unfriendly
AGI. MIRI President Michael Vassar told me about the organization’s outreach program aimed at elite universities and mathematics competitions. With a series of “rationality boot camps” MIRI and its sister organization, the Center for Applied Rationality (CFAR), hope to train tomorrow’s potential AI builders and technology policy makers in the discipline of rational thinking. When these elites grow up, they’ll use that education in their work to avoid AI’s most harrowing pitfalls.
Quixotic as this scheme may sound, MIRI and CFAR have their fingers on an important factor in AI risk. The Singularity is trending high, and Singularity issues will come to the attention of more and smarter people. A window for education about AI risk is starting to open. But any plan to create an advisory board or governing body over AI is already too late to avoid some kinds of disasters. As I mentioned in chapter 1, at least fifty-six countries are developing robots for the battlefield. At the height of the U.S. occupation of Iraq, three Foster-Miller SWORDS—machine gun-wielding robot “drones”—were removed from combat after they allegedly pointed their guns at “friendlies.” In 2007 in South Africa, a robotic antiaircraft gun killed nine soldiers and wounded fifteen in an incident lasting
an eighth of a second
.
These aren’t full-blown
Terminator
incidents, but look for more of them ahead. As advanced AI becomes available, particularly if it’s paid for by DARPA and like agencies in other countries, nothing will stop it from being installed in battlefield robots. In fact, robots may be the platforms for embodied machine learning that will help create advanced AI to begin with. When Friendly AI is available, if ever, why would privately run robot-making companies install it in machines designed to kill humans? Shareholders wouldn’t like that one bit.
Another problem with Friendly AI is this—how will friendliness survive an intelligence explosion? That is, how will Friendly AI stay friendly even after its IQ has grown by a thousand times? In his writing and lectures, Yudkowsky employs a pithy shorthand for describing how this could happen:
Gandhi doesn’t want to kill people. If you offered Gandhi a pill that made him want to kill people, he would refuse to take it, because he knows that then he would kill people, and the current Gandhi doesn’t want to kill people. This, roughly speaking, is an argument that minds sufficiently advanced to precisely modify and improve themselves, will tend to preserve the motivational framework they started in.
This didn’t make sense to me. If we cannot know what a smarter-than-human intelligence will do, how can we know if it will retain its utility function
,
or core set of beliefs? Might not it consider and reject its programmed friendliness once it’s a thousand times smarter?
“Nope,” Yudkowsky replied when I asked. “It becomes a thousand times more effective in
preserving
its utility function.”
But what if there is some kind of category shift once something becomes a thousand times smarter than we are, and we just can’t see it from here? For example, we share a lot of DNA with flatworms. But would we be invested in their goals and morals even if we discovered that many millions of years ago flatworms had created us, and given us their values? After we got over the initial surprise, wouldn’t we just do whatever we wanted?
“It’s very clear why one would be suspicious of that,” Yudkowsky said. “But creating Friendly AI is not like giving instructions to a human. Humans have their own goals already, they have their own emotions, they have their own enforcers. They have their own structure for reasoning about moral beliefs. There is something inside that looks over any instruction you give them and decides whether to accept it or reject it. With the AI you are shaping the entire mind from scratch. If you subtract the AI’s code what you are left with is a computer that is not doing anything because it has no code to run.”
Still, I said, “If tomorrow I were a thousand times smarter than today, I think I’d look back at what I was worried about today and be ‘so over that.’ I can’t believe that much of what I valued yesterday would matter to my new thousand-power mind.”