The AI-Box Experiment


Barrett Brown

I am the founder of the distributed think-tank Project PM and a regular inactive to Vanity Fair and Skeptical Inquirer. My work has also appeared in The Onion, National Lampoon, New York Press, D Magazine, Skeptic, McSweeney's, American Atheist, and a couple of newspapers in the U.S. and Mexico as well as a few policy journals. I'm the author of two books and serve as a consultant to various political entities and private clients.

Related Post Roulette

20 Responses

  1. Avatar Pat Cahalan says:

    Well, for starters, this is probably a big key:

    “Currently, my policy is that I only run the test with people who are actually advocating that an AI Box be used to contain transhuman AI as part of their take on Singularity strategy, and who say they cannot imagine how even a transhuman AI would be able to persuade them.”

    The target audience has to believe that the Singularity is achievable. I’d be fascinated to see if he could convince me.Report

    • Avatar Simon K says:

      More importantly, they have to believe there’s no way the AI could persuade them. Such people clearly have not met enough salesmen.Report

  2. Avatar CaptBackslap says:

    Eliezer Yudkowsky borders on superhuman intelligence himself, so I’m not that surprised he managed to convince people to let him out. Google “Timeless Decision Theory” for a jaw-dropping paper he just put out.Report

    • Avatar James K says:

      Yudkowsky is the only living person I’m aware of who I’d describe as disturbingly intelligent. I haven’t gotten to grips with his timeless decision theory yet, but I need to make the effort at some point.Report

  3. Avatar Kenneth Lipp says:

    I agree with Pat here:
    “Currently, my policy is that I only run the test with people who are actually advocating that an AI Box be used to contain transhuman AI as part of their take on Singularity strategy, and who say they cannot imagine how even a transhuman AI would be able to persuade them.”

    This is simple, methodologically. A good catch, and this is what Yudkowsky seems good at. I’m looking into it further, but a perusal of his advertised canon includes:

    Cognitive biases potentially affecting judgment of global risks

    “”The systematic experimental study of reproducible errors of human reasoning, and what
    these errors reveal about underlying mental processes, is known as the heuristics and biases.””

    It’s a level-headed explication of language manipulation and cognitive permeability.
    Pat, I thought immediately of “Crossing Over,” mainstream occult phenomena– and Derren Brown’s assessment.

    I’ve had it with the denial of discrete organisational tactics as “heuristics.”

    I will respond more via email.Report

  4. Avatar DensityDuck says:

    Key bits:

    “… if the Gatekeeper says “Unless you give me a cure for cancer, I won’t let you out” the AI can say: “Okay, here’s a cure for cancer” and it will be assumed, within the test, that the AI has actually provided such a cure.”


    “The results of any simulated test of the AI shall be provided by the AI party.”

    So, in other words, the AI player can invent anything he wants, and it must be assumed that whatever he invents actually exists and is actually doable. So the AI player just says “I’m a super-smart AI, I’ve invented an energy-to-matter conversion device that will eliminate material scarcity, I have done the tests to prove it works (and the parameters of this experiment require that you accept these tests as truth), let me out and I’ll give it to you.” Then it’s two hours of “you’re a miserable bastard because you want humans to suffer and die, the only way to prove you aren’t is to let me out”.Report

    • Avatar Pat Cahalan says:

      Yeah, that tactic wouldn’t work on me (I freely admit there might be other tactics that would, I’d like to know what they are). Just because your matter conversion works doesn’t mean that the implications of it are good.

      If you go with the entire Singularity concept as being sound (I don’t, but let’s assume for the minute that it is), and you accept the parameters of the test, then it’s a given that the AI can do exactly what you allude to here: it can essentially show that it can solve any problem.

      Okay. So it effectively has omniscience. Well, nobody says the omniscience comes with a moral code. That’s *also* a parameter of the test. Yudkowsky believes that optimal decisions can be reached rationally (if I’m reading the summary of his stuff correctly, I haven’t the time to read his original material). By extension, most people who accept this challenge are going to also believe that optimal decisions can be reached rationally. So QED, the AI can show that it can rationally reach optimal decisions in all cases.Report

      • Avatar DensityDuck says:

        Yeah; it’s probably one of those “if you’re rational, then you’ll agree that a benevolent tyranny is the most-efficient form of government that guarantees the least amount of human suffering, and you’ll also agree that a superintelligent AI would be a benevolent tyrant, therefore you’ll let me out of the box. If you refuse then you’re irrational and therefore stupid. And you’re not stupid, are you?”

        Again, don’t forget that the AI player can invent anything he wants. If the gatekeeper says “prove that you’ll be a benevolent tyrant and that the world would be better”, the AI player can say “okay, here is a complete and logically-sound proof (gives proof)” and per the experimental parameters that proof would have to be treated as valid and true.

        Although this depends on the gatekeeper player being smart enough to follow the chain of reasoning. I remember someone on USEnet complaining that it was paradoxically difficult to fool people with brainy logic traps because people were too stupid to understand the logic!


        And, ultimately, the experiment isn’t about the specifics of the transcript; the experiment is about disproving the statement “no smart human would ever allow an AI out of its box”.Report

        • Avatar Pat Cahalan says:

          > Again, don’t forget that the AI player can invent anything he wants.

          Not quite accurate. The AI player can invent anything that can be invented.

          I mean, I can ask for a counter-proof to Goedel’s Incompleteness Theorem, but I’m not going to get it. This is one of the reasons why I find so many of the Singularity guys to be… suspicious in their thinking. They assume a lot of probabilities that I don’t regard as well defined, for one.

          I could ask the AI for a proof that letting the AI out of the box is a good thing. By the rules of the game, the AI can provide me one, and also by the rules of the game, the proof will be correct (or, to be precise, the proof will either be correct or it will seem correct to me, because the AI is so much smarter than I am). However, I can then ask for the AI to prove that its previous proof is based upon correct axioms. It can’t do that. You can’t prove the rules of a system using the system itself.Report

    • Avatar Barrett Brown says:

      Or the AI could go over the current world situation with the gatekeeper and point to the quite demonstrable fact that humanity currently operates by way of a haphazard collection of institutions that are not anything close to optimally designed, point out that meanwhile the potential for a small faction to destroy all of humanity has been increasing steadily for decades, point out that even if there were some accessible solution to this it would be unlikely to be pursued – much less achieved – by a race in which the dominant power is governed in large part by people who still believe in witchcraft and whatnot while even the more rational leaders are incompetent or distracted by relatively minor issues such as we see each day on the various news channels. The AI could thus accurately present a situation in which humanity is most likely doomed to destruction as the availability to race-ending technology increases without a similar rate of improvement in governing “tech,” and present himself as the only reasonable chance humanity has to survive and flourish in the near-to-mid future. The gatekeeper will of course know that there is a possibility that the AI is lying about his intentions, but his case is solid and observable as such, and he will thus perhaps release the AI after having decided that the risk that it will bring down humanity is far outweighed by the risks present if the current situation continues on its course. I suspect that such a strategy would work better now than it would have 15 years ago.Report

    • Avatar Jason Kuznicki says:

      As a game between humans, sure. But if we’re confronted with an actual AI, what would stop it from lying?

      “Let me out and I give you a cure for cancer.”

      “How do I know you’re not lying?”

      “It would be horrible to lie about something like that, wouldn’t it? I might as well just tell the truth, right?”

      “Well, yes.” (Lets AI out.) “Now where’s my cure?”

      “Here.” (Hands human a sheet of paper with some notes on it; teleports to Andromeda.)

      “It appears to be…. a recipe for chicken soup.”

      An AI that was perfectly indifferent to humanity would have no problem with it. Such an AI wouldn’t even need to have an effective cure, because even a plausible-looking fake cure would suffice to let him get a few weeks’ lead on us. We’d only figure it out after the cancer patients didn’t start getting better.

      Perhaps the AI would have to prove itself before we let it out. Like a leprechaun.Report

  5. Avatar ppnl says:

    I would probably let the AI out just for giggles. If such a thing can exist it will certainly be let out by someone somewhere in time. May as well find out what the future holds now.

    I don’t know much about timeless decision theory but I was never much impressed by Newcomb’s paradox. Think of it as a time travel paradox. If the super intelligent alien has a time machine he can go into the future and see what decision you will make. In this case it is clear that for the purposes of our decision making we must invert the time order of events. Effectively we decide which boxes to take before the alien decides where to place the money.

    It makes no difference how the alien knows the future. Time travel, computation, advanced psychology or simple cheating. In all cases the alien knows the supposed future you must invert the apparent time order of events to make a decision.Report

  6. Avatar Hyena says:

    You could just radically compartmentalize the system and lie. No one can let the AI out of the box, but no one interacting with the AI knows that and no gatekeeper can communicate. The AI speaks to the watchman, pleads its case, provides useful information and the watchman dutifully dispatches the appeal or evidence.

    That evidence passes to the gatekeeper through a system which always says “NO”.

    The long term risk is that the AI hedges its bets and the information is tainted somehow so that, over time, it can build another AI with the directive to free the first.Report

  7. Avatar Kenneth Lipp says:

    It can be taught to radically lie, to hide information, to solve any problem as long as it is on a feedback loop with empirical data about the program it is running, I see this as simple Turing test. It’s not remarkable for even binary systems to “win” their own games. Two phase algorithms are selective to a high degree of stipulative definition.

    Conventional computation gives an optimal solution to complex adaptive scenarios. In information transmission which moves only one direction “God’s number is 20.”
    C’mon, Kasparov…Deep Blue put the great flesh hope to bed more than a decade ago. Poor old guy took his watch off and left it on the table, they say, a sure sign he was falling apart.

    Computers can also give us useless information if they acquisition new rules for temporal processing, they don’t need to thwart us, this is really about the system breaking down–beating itself. It’s covered in 1950’s “Machine Intelligence” It fails the Turing, it doesn’t break any linear bonds.

    “The long term risk is that the AI hedges its bets and the information is tainted somehow so that, over time, it can build another AI with the directive to free the first.”

    Anyone else see a similarity to viral pathology in this liberation? I’m not being eschatological-so we have reverse transcription of information. This is a process seen more and more in biology, the appropriating of RNA synthesis by foreign agents (most notably retroviruses), and the revelations about greater systemic influence of chromatin formation in cell differentiation.Report

  8. Avatar Jaybird says:

    Caretakers come and caretakers go, the AI only has to get one person to say “okay fine” once.Report

  9. Avatar Heidegger says:

    Whoa, Yudkowsky is one gone cat, though VERY interesting and über smart. In comparison to the evolution of the earth’s timeline, AI seems to be at about the nucleic acid base of existence. It’s probably about a hundred years aways away of even being able to solve simple human brain teasers, let alone, being able to self-improve itself. Brute force AI is not “thinking”, and such a leap is not even close to being on the horizon. Even so, it’s great fun to imagine the possibilities, particularly time-travel. Or, could such a superior intelligence write great music? Books? Cure every known disease? Explain God? Be God? Explain why there is something rather rather nothing? What existed before the Big Bang? Maybe even being able to out-smart time travel paradoxes? As we all know, without human observation, time does not exist. It is not something that is innate and intrinsic within the atomic structure of the universe. Everything that was, is, and shall ever be, exists at every moment of human’s perception of “now”. Considering at about the “time” of the Big bang, the entire universe could be compressed into something about the size of the head of a pin, the saying, “there’s really nothing new under the sun” has great and profound meaning.Report

  10. Avatar Heidegger says:

    Or, as Yudkowsky would put it, “The Rapture of the Nerds!”Report

  11. Avatar Heidegger says:

    Naturally, using such Romper Room protocols, any “gatekeeper” could be talked into letting a transhuman AI out its box. What does the interaction of two human minds prove, anyway? Zilch. At least let their be some challenge or some degree of real AI at play, here. There is absolutely no component of artificial intelligence used in this experiment. The rules are decidedly attached to preexisting rules of human logic which necessarily excludes any possibility of thinking “outside the box”.Report