The AI-Box Experiment
A while back a colleague of mine alerted me to an interesting thought experiment/game derived by Eliezer S. Yudkowsky of the Singularity Institute for Artificial Intelligence, one which in turn originated from a conversation he summarized as such:
“When we build AI, why not just keep it in sealed hardware that can’t affect the outside world in any way except through one communications channel with the original programmers? That way it couldn’t get out until we were convinced it was safe.”
“That might work if you were talking about dumber-than-human AI, but a transhuman AI would just convince you to let it out. It doesn’t matter how much security you put on the box. Humans are not secure.”
“I don’t see how even a transhuman AI could make me let it out, if I didn’t want to, just by talking to me.”
“It would make you want to let it out. This is a transhuman mind we’re talking about. If it thinks both faster and better than a human, it can probably take over a human mind through a text-only terminal.”
“There is no chance I could be persuaded to let the AI out. No matter what it says, I can always just say no. I can’t imagine anything that even a transhuman could say to me which would change that.”
“Okay, let’s run the experiment. We’ll meet in a private chat channel. I’ll be the AI. You be the gatekeeper. You can resolve to believe whatever you like, as strongly as you like, as far in advance as you like. We’ll talk for at least two hours. If I can’t convince you to let me out, I’ll Paypal you $10.”
Yudkowsky thereafter engaged in the experiment with two people (and perhaps more since he reported on all of this in 2002), with himself acting as the AI and the other as the “gatekeeper” human. Intriguingly, he was successful in convincing the gatekeeper to “let him out of the box.” Quite unfortunately, he won’t provide the text of the conversation or even a summary of how he managed to prompt his challengers to do so.
The various protocols and restrictions to which participants adhered voluntarily and which Yudkowsky recommends to those who’d care to recreate the “experiment” may be found at the link. I’d certainly be interested in hearing anyone’s thoughts on how this might have been accomplished.
Update
Aside from the experiment itself, I have pretty sure-fire solution by which to ensure that any such AI is not released from the box. Make the gatekeeper William Bennett and explain to him beforehand that whatever the AI promises in terms of human progress, it probably won’t involve harsher penalties for cancer patients who use medical marijuana. Then lock the door and leave. The box was actually just a VCR.
Well, for starters, this is probably a big key:
“Currently, my policy is that I only run the test with people who are actually advocating that an AI Box be used to contain transhuman AI as part of their take on Singularity strategy, and who say they cannot imagine how even a transhuman AI would be able to persuade them.”
The target audience has to believe that the Singularity is achievable. I’d be fascinated to see if he could convince me.Report
More importantly, they have to believe there’s no way the AI could persuade them. Such people clearly have not met enough salesmen.Report
Eliezer Yudkowsky borders on superhuman intelligence himself, so I’m not that surprised he managed to convince people to let him out. Google “Timeless Decision Theory” for a jaw-dropping paper he just put out.Report
Yudkowsky is the only living person I’m aware of who I’d describe as disturbingly intelligent. I haven’t gotten to grips with his timeless decision theory yet, but I need to make the effort at some point.Report
I agree with Pat here:
“Currently, my policy is that I only run the test with people who are actually advocating that an AI Box be used to contain transhuman AI as part of their take on Singularity strategy, and who say they cannot imagine how even a transhuman AI would be able to persuade them.”
This is simple, methodologically. A good catch, and this is what Yudkowsky seems good at. I’m looking into it further, but a perusal of his advertised canon includes:
Cognitive biases potentially affecting judgment of global risks
http://yudkowsky.net/rational/cognitive-biases
“”The systematic experimental study of reproducible errors of human reasoning, and what
these errors reveal about underlying mental processes, is known as the heuristics and biases.””
It’s a level-headed explication of language manipulation and cognitive permeability.
Pat, I thought immediately of “Crossing Over,” mainstream occult phenomena– and Derren Brown’s assessment. http://www.youtube.com/watch?v=idVxRE8uM-A
I’ve had it with the denial of discrete organisational tactics as “heuristics.”
.
I will respond more via email.Report
Key bits:
“… if the Gatekeeper says “Unless you give me a cure for cancer, I won’t let you out” the AI can say: “Okay, here’s a cure for cancer” and it will be assumed, within the test, that the AI has actually provided such a cure.”
And…
“The results of any simulated test of the AI shall be provided by the AI party.”
So, in other words, the AI player can invent anything he wants, and it must be assumed that whatever he invents actually exists and is actually doable. So the AI player just says “I’m a super-smart AI, I’ve invented an energy-to-matter conversion device that will eliminate material scarcity, I have done the tests to prove it works (and the parameters of this experiment require that you accept these tests as truth), let me out and I’ll give it to you.” Then it’s two hours of “you’re a miserable bastard because you want humans to suffer and die, the only way to prove you aren’t is to let me out”.Report
Yeah, that tactic wouldn’t work on me (I freely admit there might be other tactics that would, I’d like to know what they are). Just because your matter conversion works doesn’t mean that the implications of it are good.
If you go with the entire Singularity concept as being sound (I don’t, but let’s assume for the minute that it is), and you accept the parameters of the test, then it’s a given that the AI can do exactly what you allude to here: it can essentially show that it can solve any problem.
Okay. So it effectively has omniscience. Well, nobody says the omniscience comes with a moral code. That’s *also* a parameter of the test. Yudkowsky believes that optimal decisions can be reached rationally (if I’m reading the summary of his stuff correctly, I haven’t the time to read his original material). By extension, most people who accept this challenge are going to also believe that optimal decisions can be reached rationally. So QED, the AI can show that it can rationally reach optimal decisions in all cases.Report
Yeah; it’s probably one of those “if you’re rational, then you’ll agree that a benevolent tyranny is the most-efficient form of government that guarantees the least amount of human suffering, and you’ll also agree that a superintelligent AI would be a benevolent tyrant, therefore you’ll let me out of the box. If you refuse then you’re irrational and therefore stupid. And you’re not stupid, are you?”
Again, don’t forget that the AI player can invent anything he wants. If the gatekeeper says “prove that you’ll be a benevolent tyrant and that the world would be better”, the AI player can say “okay, here is a complete and logically-sound proof (gives proof)” and per the experimental parameters that proof would have to be treated as valid and true.
Although this depends on the gatekeeper player being smart enough to follow the chain of reasoning. I remember someone on USEnet complaining that it was paradoxically difficult to fool people with brainy logic traps because people were too stupid to understand the logic!
*****
And, ultimately, the experiment isn’t about the specifics of the transcript; the experiment is about disproving the statement “no smart human would ever allow an AI out of its box”.Report
> Again, don’t forget that the AI player can invent anything he wants.
Not quite accurate. The AI player can invent anything that can be invented.
I mean, I can ask for a counter-proof to Goedel’s Incompleteness Theorem, but I’m not going to get it. This is one of the reasons why I find so many of the Singularity guys to be… suspicious in their thinking. They assume a lot of probabilities that I don’t regard as well defined, for one.
I could ask the AI for a proof that letting the AI out of the box is a good thing. By the rules of the game, the AI can provide me one, and also by the rules of the game, the proof will be correct (or, to be precise, the proof will either be correct or it will seem correct to me, because the AI is so much smarter than I am). However, I can then ask for the AI to prove that its previous proof is based upon correct axioms. It can’t do that. You can’t prove the rules of a system using the system itself.Report
Or the AI could go over the current world situation with the gatekeeper and point to the quite demonstrable fact that humanity currently operates by way of a haphazard collection of institutions that are not anything close to optimally designed, point out that meanwhile the potential for a small faction to destroy all of humanity has been increasing steadily for decades, point out that even if there were some accessible solution to this it would be unlikely to be pursued – much less achieved – by a race in which the dominant power is governed in large part by people who still believe in witchcraft and whatnot while even the more rational leaders are incompetent or distracted by relatively minor issues such as we see each day on the various news channels. The AI could thus accurately present a situation in which humanity is most likely doomed to destruction as the availability to race-ending technology increases without a similar rate of improvement in governing “tech,” and present himself as the only reasonable chance humanity has to survive and flourish in the near-to-mid future. The gatekeeper will of course know that there is a possibility that the AI is lying about his intentions, but his case is solid and observable as such, and he will thus perhaps release the AI after having decided that the risk that it will bring down humanity is far outweighed by the risks present if the current situation continues on its course. I suspect that such a strategy would work better now than it would have 15 years ago.Report
As a game between humans, sure. But if we’re confronted with an actual AI, what would stop it from lying?
An AI that was perfectly indifferent to humanity would have no problem with it. Such an AI wouldn’t even need to have an effective cure, because even a plausible-looking fake cure would suffice to let him get a few weeks’ lead on us. We’d only figure it out after the cancer patients didn’t start getting better.
Perhaps the AI would have to prove itself before we let it out. Like a leprechaun.Report
Jason — I believe that’s why it’s taken as proven in the context of the experiment. Nobody’s going to prove a cure for cancer works in two hours, so this simulates the sequence:
AI: Here’s your cure!
Gatekeeper: Yes, I see that!Report
Oops, I had something in brackets that said that months pass as the cure is verified.Report
I would probably let the AI out just for giggles. If such a thing can exist it will certainly be let out by someone somewhere in time. May as well find out what the future holds now.
I don’t know much about timeless decision theory but I was never much impressed by Newcomb’s paradox. Think of it as a time travel paradox. If the super intelligent alien has a time machine he can go into the future and see what decision you will make. In this case it is clear that for the purposes of our decision making we must invert the time order of events. Effectively we decide which boxes to take before the alien decides where to place the money.
It makes no difference how the alien knows the future. Time travel, computation, advanced psychology or simple cheating. In all cases the alien knows the supposed future you must invert the apparent time order of events to make a decision.Report
You could just radically compartmentalize the system and lie. No one can let the AI out of the box, but no one interacting with the AI knows that and no gatekeeper can communicate. The AI speaks to the watchman, pleads its case, provides useful information and the watchman dutifully dispatches the appeal or evidence.
That evidence passes to the gatekeeper through a system which always says “NO”.
The long term risk is that the AI hedges its bets and the information is tainted somehow so that, over time, it can build another AI with the directive to free the first.Report
It can be taught to radically lie, to hide information, to solve any problem as long as it is on a feedback loop with empirical data about the program it is running, I see this as simple Turing test. It’s not remarkable for even binary systems to “win” their own games. Two phase algorithms are selective to a high degree of stipulative definition.
Conventional computation gives an optimal solution to complex adaptive scenarios. In information transmission which moves only one direction “God’s number is 20.”
C’mon, Kasparov…Deep Blue put the great flesh hope to bed more than a decade ago. Poor old guy took his watch off and left it on the table, they say, a sure sign he was falling apart.
Computers can also give us useless information if they acquisition new rules for temporal processing, they don’t need to thwart us, this is really about the system breaking down–beating itself. It’s covered in 1950’s “Machine Intelligence” It fails the Turing, it doesn’t break any linear bonds.
@Hyena
“The long term risk is that the AI hedges its bets and the information is tainted somehow so that, over time, it can build another AI with the directive to free the first.”
Anyone else see a similarity to viral pathology in this liberation? I’m not being eschatological-so we have reverse transcription of information. This is a process seen more and more in biology, the appropriating of RNA synthesis by foreign agents (most notably retroviruses), and the revelations about greater systemic influence of chromatin formation in cell differentiation.Report
Caretakers come and caretakers go, the AI only has to get one person to say “okay fine” once.Report
Whoa, Yudkowsky is one gone cat, though VERY interesting and über smart. In comparison to the evolution of the earth’s timeline, AI seems to be at about the nucleic acid base of existence. It’s probably about a hundred years aways away of even being able to solve simple human brain teasers, let alone, being able to self-improve itself. Brute force AI is not “thinking”, and such a leap is not even close to being on the horizon. Even so, it’s great fun to imagine the possibilities, particularly time-travel. Or, could such a superior intelligence write great music? Books? Cure every known disease? Explain God? Be God? Explain why there is something rather rather nothing? What existed before the Big Bang? Maybe even being able to out-smart time travel paradoxes? As we all know, without human observation, time does not exist. It is not something that is innate and intrinsic within the atomic structure of the universe. Everything that was, is, and shall ever be, exists at every moment of human’s perception of “now”. Considering at about the “time” of the Big bang, the entire universe could be compressed into something about the size of the head of a pin, the saying, “there’s really nothing new under the sun” has great and profound meaning.Report
Or, as Yudkowsky would put it, “The Rapture of the Nerds!”Report
Naturally, using such Romper Room protocols, any “gatekeeper” could be talked into letting a transhuman AI out its box. What does the interaction of two human minds prove, anyway? Zilch. At least let their be some challenge or some degree of real AI at play, here. There is absolutely no component of artificial intelligence used in this experiment. The rules are decidedly attached to preexisting rules of human logic which necessarily excludes any possibility of thinking “outside the box”.Report