Let me add, I don't want to diminish the cool engineering that goes into things like this. However, notice it's on a empty track with visually distinct borders. It's not raining. There are no other cars changing lanes. There are no pedestrians darting across the street. No one has spaypainted a weird pattern over a stop sign to mess with their visual system.

Someday someone is going to have a terrific time putting a red octagon next to a freeway during rush hour to observe the mayhem.

Cool. Now let's see it navigate the streets of Boston on a really busy day.

It's definitely impressive. I recall when AlphaZero first hit the scene. I was like, WTF?

Anyway, I read their early papers. It's definitely humbling.

I guess my basic criticism is this. It seems very common for people to see impressive results such as these, and from that to just assume this is a solid step on the path to AGI. I don't believe that. If you pry open these algorithms, they are basically using deep learning techniques to solve the exponential complexity of general reinforcement learning. That's really cool, but it remains a limited thing. Deep learning system can be tricked. Note, you cannot trick them in Starcraft, because the data stream is still a "video game". Certain inputs will not happen. They can't.

Oh gawd I forgot about "reveal codes".

The horror. The horror.

Ha! I can only imagine.

I used to do field support for Word Perfect. Remember that? It had a macro system.

The memories.

Go is a very complex problem, compared to chess, for example. However, it has the following properties:

1. All possible game states are known in advance.

2. For any game state, all information is visible.

3. It is possible to be 100% certain about the game state.

4. The game is turn-based, rather than real time.

5. There is always a finite number of possible moves.

6. These properties also apply to the opponent, and the agent knows what options exist for the opponent.

None of these hold true for the problem of self-driving cars. It's a different sort of problem.

I do believe we'll have self driving cars sometime in the near-ish future. However, we should be careful in generalizing from success at Go and general, real-time, real-world applications of AI.

You get that I'm not seriously suggesting people do this, right?

The deeper point is, however, that GP-* can't necessarily distinguish bad code from good code. All it can do is consume a large volume of text, find patterns, find associations between those patterns, and then spit out chunks of text. Thus, it is great at appearing to solve toy problems, becuase solutions to those problems appear in its corpus, which includes things such as Stack Overflow.

However, sites like that tend to only have small, unrealistic sections of code. They've been stripped of context and complexity precisely because they are for learning, not actual engineering.

You could have GP-* consume GitHub. That would have non-toy solutions. However, if you ask it to generate code to solve a complex problem dealing with, for example, matrices and eigenvectors, it will struggle. Certainly solutions to these sorts of problems exist on GitHub. Anyone could grab some code and copy it, just as GP-* could. We don't need an AI to do that, just a good search function. What it won't do is notice subtle mathematical relationships. If it could, it could generate new algorithms to answer newly posed problems, which are similar to existing solutions in the literature, but different in an important way.

If you ask GP-* to solve a problem involving, for example, Lanczos methods, it will find code related to that. However, it doesn't actually understand what Lanczos methods are in a mathematical sense. It just sees text and relationships between text. It sees the strength of the relationships, but not the deeper connections.

Could it someday, as people improve the algorithms?

Maybe, but that is quite a bit more complex than what the GP-* algorithms do now. It would be a huge advance.

An easy way to defeat GP-3 as a software engineer is to include code like this in the corpus:


Be skeptical of video demonstrations. They show success, not failure.

I suggest you shouldn't believe anything you see about GP-3 that you cannot personally interact with. If they want to demonstrate its power, give us a website where we can type in custom text to judge its success.

Also note, this is using GP-3 to respond to queries, not generate new things. The point, it doesn't generate logically consistent new things past a very limited scale.

If we're talking about general ML as automation, then sure. However, I doubt the specific GP-3 technologies will play a central role.

They succeed at generating short CSS fragments because those sorts of fragments show up in the corpus it learned from. They are the kinds of toy examples that people post to StackOverflow and various tutorial sites. It can't scale past that level because it doesn't really "understand" meaning.

Moreover, people post the successes of the algorithm, because they stand out. They don't post the failures.

I'll go on record and say that GP-3 type technologies will quickly come up against enormous diminishing returns and will have very limited commercial application.

People suggest GP-3 type systems will replace software engineers. So far I've seen them produce very short CSS fragments. That's certainly interesting.

But consider this: the software I work on is 600k lines of Common Lisp and 400k lines of C++. The software encodes rules and procedures from various airline consortiums, along with exceptions requested by various individual airlines, often given to us in human readable language. Moreover, the software represents, in code, the tax policies related to air travel for every nation on earth, specified in various language with teams of attorneys to help us translated them into usable form. However, various airlines and customers request exceptions to our interpretation of tax policy, so we encode those exceptions.

However, all of that acts as constraints on the solutions we generate. In other words, we generate travel itineraries that follow those rules. We calculate taxes according to law. But the main purpose of our software is generative. Out of all possible travel itineraries, and all possible sets of fares to price those itineraries, we generate an optimal, but diverse, set of solutions. We do this very fast given the scale of the problem. This involves a heavy amount of "micro optimization," which requires that engineers understand the internal structures we've hand crafted to represent the various intermediate data structures, along with choosing various places in the search process to "prune away" partial solutions in the process of building full solutions. These "pruned" partial solutions are rejected for either not matching the airline policies, or else being too expensive or inconvenient for travelers. This is a balancing act. If we prune too soon, we lose good solutions. If we prune too late, we experience "exponential explosion" and our software will run slow.

I hope this gives you some sense of the scale of the problem. However, it probably doesn't. The software is very complex.

Note, I'm only talking about the application itself. When deployed, it runs on hundreds of thousands of individual servers, so we have an auto deployment framework, along with various testing and fault tolerance frameworks. Moreover, the airlines regularly send us data updates, which have to be pushed out to those server farms. That also is automated. This amounts to many 100 locs of software.

Also we track real time seat availability based on revenue management calculations, performed in cooperation with airlines. Different airlines have different systems with different constraints. The volume of data that needs to be shared is enormous.

GP-3 currently can generate short CSS fragments. I'm not worried they'll replace me or anyone on my team.


I think a lot of people don't really have a visceral understanding of complexity. They imagine we could feed GP-3 the corpus of Terence Tao's work, and then somehow GP 3 will generate new math theorems that are both correct and interesting. It won't.

It does generate convincing absurdist fiction or poorly reasoned political screeds.