Short Status Report on the Abilities of AI
Back in 2004, the movie I, Robot came out (starring Will Smith) and it contained the above scene. It’s set in Chicago in 2035 (hey, 10 years from now!) when robots are ubiquitous and, yep, they’re all run by AI. The interaction between Will Smith and the robot follows:
Will Smith: Can a robot write a symphony? Can a robot turn a canvas into a beautiful Masterpiece?
Robot: Can you?
Well, Twitter’s very own Linch pointed out a bet that Gary Marcus is making with the (former) AI Engineer Miles Brundage:
Can AI do 8 of these 10 by the end of 2027?
1. Watch a previously unseen mainstream movie (without reading reviews etc) and be able to follow plot twists and know when to laugh, and be able to summarize it without giving away any spoilers or making up anything that didn’t actually happen, and be able to answer questions like who are the characters? What are their conflicts and motivations? How did these things change? What was the plot twist?
2. Similar to the above, be able to read new mainstream novels (without reading reviews etc) and reliably answer questions about plot, character, conflicts, motivations, etc, going beyond the literal text in ways that would be clear to ordinary people.
3. Write engaging brief biographies and obituaries without obvious hallucinations that aren’t grounded in reliable sources.
4. Learn and master the basics of almost any new video game within a few minutes or hours, and solve original puzzles in the alternate world of that video game.
5. Write cogent, persuasive legal briefs without hallucinating any cases.
6. Reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]
7. With little or no human involvement, write Pulitzer-caliber books, fiction and non-fiction.
8. With little or no human involvement, write Oscar-caliber screenplays.
9. With little or no human involvement, come up with paradigm-shifting, Nobel-caliber scientific discoveries.
10.Take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.
Check out 7, 8, and 9 again.
Now look at the interaction between Will Smith and the robot again.
It feels like the goalposts have shifted. We’ve reached the point where AI can write symphonies and it’s not a big deal. Whether it’s capable of turning a canvas into a beautiful masterpiece is probably up for debate but if we’re allowed to discuss CGI, I think I can get away with using this as an example. I asked Dall-E to make me a “silk-screen painting of a night scene, on a river, where boats with lamps are sailing past”. It gave me this in fewer than 10 seconds:
It’s not, you know, a Monet or anything like that but if my friend brought something similar home from a “drink wine, paint a painting” class, my eyes would bug out of my head and I’d say “That’s amazing!”
So, of course, we’re stuck with either moving the goalposts or admitting that, yeah, they’ve been met.
And now we’re at a place where we’re asking for AI to demonstrate that it’s capable of doing something that only a handful of people on the planet are capable of. Or, getting down into the weeds, if the AI wrote a screenplay that was better than Green Book (2018 Oscar Winner) or Crash (2005 Oscar Winner), would that count or would we demand something at least as good as Annie Hall (1977) or Dog Day Afternoon (1975).
We’ve gone from using the measuring stick of whether the AI can do something at all to whether it can do something as good as the best humans have done it.
Anyway, I’ve started wondering what the bets will look like in, oh, 2027.
I’m so pleased that we’ve got AI to write books and create art which will free us up to do laundry, pick up trash, and do the dishes. Great job, guys.Report
You already have machines that do your dishes and laundry for you. What you consider “doing laundry” or “doing dishes” is putting items into these machines or taking the items out of them.
As for trash, I imagine you have a service that takes your trash from your home on a weekly basis. (I know that *I* have one.)
When it comes to the domestic help aspect where we want robots who will do the menial tasks of putting the items into the machines and taking them out for us, unfortunately, the ability to make art might be an epiphenomenon of that. (At least it shows up prior to the machines that have enough ability to sort lights from darks from the stuff that gets fabric softener).Report
Anyway, I continue to maintain that what we’re calling “AI” is nothing more than a clever search engine capable of skimming the internet and copying what it finds.Report
Precisely. It’s Google souped up by larger servers.Report
Play with it again.
Visit Claude and have a conversation about something.
https://claude.ai/new
Talk about your favorite drummers to play and ask for suggestions for songs to listen to.
Or, if you say that that’s just like googling, have it read an essay of yours and make suggestions for what you got wrong.
Talk to it like a therapist. “Hey, I have a co-worker who keeps putting remote control fart machines around my desk and activating them when I’m on the phone. I’m trying to figure out how to be zen about this.”
Just have a quick conversation.Report
I decided I didn’t want to give it my cell phone number.Report
Oh, I talk to it on my desktop (where it gets my google account).Report
I’m currently working in the space and would say that the way to think about the LLM is as one component of a framework. In simple terms, it’s doing the amazing semantic work of interpreting questions, search, and formatting answers.
Where I’m seeing the tech actually go, however, isn’t in hoping that LLMS get smarter and less hallucinogenic (though that’s happening too), it’s using the LLM as a lego block to do the semantic thing, while carefully limiting what it needs to review, plus adding human led review of output to further fine-tune and train what ‘good’ looks like.
But as others have mentioned, that makes it a really good ‘search’ engine; but much more than *just* a search engine since it does semantics much better than key word SOE… BUT, it really is kinda dumb and has a really hard time figuring out what a ‘good’ response is in applications we’d like to use it for — not without strict focus and curation. So, a really cool tool that we’ll definitely see more of, but not (probably) as direct access to the LLM itself.
On the less Tech Optimist side, the LLMs themselves outside of the current agentic use of them could (I’d hypothesize) slip their boundaries with enough training and compute, if not careful.Report