# sports metrics and the problem with unconventional wisdom

Freddie

Freddie deBoer used to blog at lhote.blogspot.com, and may again someday. Now he blogs here.

### 93 Responses

1. Will says:

I can accept that statistics are frequently cherry-picked, but if you’re going to argue that statistical analysis is wrong in this context, I think you need to point out what Football Outsiders missed.Report

• Freddie says:

Uh, you mean like “basic argumentative consistency”?Report

• BCChase says:

See below – I think they were consistent from a statistical standpoint, one principle of which is you do not draw conclusions from single points of data, but from large samples and aggregations. They are not arguing against all inductive reasoning, but all inductive reasoning from any one single instance.Report

2. BCChase says:

The defense to your inductive reasoning attack is that on a single football play, the random chance and luck factors involved make it very difficult and/or wrong to inductively reason from it. Aggregating individual instances and then measuring average trends goes a long way towards eliminating the random chance. In the Belichick example, the point is that based on large scale inductive reasoning, his decision making process was sound, and just because it did not work does not refute the principle, because random chance played a large role (for example, the way Faulk bobbled the ball, the spot by the referee.)Report

• Freddie says:

The defense to your inductive reasoning attack is that on a single football play, the random chance and luck factors involved make it very difficult and/or wrong to inductively reason from it.

Precisely why drawing deterministic readings of statistical data is folly.Report

• BCChase says:

It is only folly for small sample sizes.Report

• Barry says:

BCChase: “The defense to your inductive reasoning attack is that on a single football play, the random chance and luck factors involved make it very difficult and/or wrong to inductively reason from it.”

Freddie : “Precisely why drawing deterministic readings of statistical data is folly.”

Nobody is doing that; they are urging that one go with the option which in similar circumstances has had the higher proportion of success.Report

• Freddie says:

In the most similar of circumstances– that very game– the Colts failed to score on 9 of 14 drives. Why is one method invalid and the other valid?Report

• 9 of 14 is still only a little over 60%. It would be one thing if we were talking about a 12-6 game or something like that, where the context suggested that we should throw the ordinary statistics out. But this was an abnormally high-scoring game – if anything, the context makes the odds even a little better to support going for it.Report

• BCChase says:

Exactly.Report

3. Alex Massie says:

Surely the most important thing in this instance is that the decision Belichick took gave the Pats TWO ways to win. That is, get the first down and run out the clock OR don’t get it but stop the Colts from scoring a touchdown. If you punt the ball then you’re halving the number of ways you can win the game. ie, you put everything on your defence. Going for the first down spreads the risk of losing between the offense AND defense. One need not be any kind of statistician to appreciate that this seems like a good idea.Report

• BCChase says:

I agree, and to support this, assume Belichick seriously doubted his defense could stop Manning from scoring with ~2 min left from wherever on the field. Then the best option is not to give Manning the ball back. So to me, the real question is whether the Pats D could have stopped Manning from 70 yards out. That Belichick seemed to think “no” is a bigger problem to me than his going for it on 4th down.Report

4. Roberto says:

What nearly all of the devotees of the new metrics miss is that the law of diminishing returns applies to their pursuits, too. Case in point: fielding metrics. Anyone who has watched Adam Dunn knows that he is a terrible fielder, a defensive liability if ever there was one. But the claim that his defense costs his team more runs than his offense provides them is an example of what you wrote about. By their logic, the Reds and Nationals would be better off with a utility man who is a decent defensive player but who can’t hit a lick (Nats fans know him as “Pete Orr”) than someone with an OPS in .900 range. This is why the phrase “too clever by half” was coined.

At least baseball lends itself to this kind of statistical analysis — the attempt to extend it to football and basketball strikes me as the sporting equivalent of “physics envy.”Report

• Paul McLeod says:

Wait, how do you know that his offense is good enough to make up for his truly atrocious defense? A reasonable observer of fielding metrics would admit that he doesn’t KNOW that the defense is bad enough to overwhelm the offense (advanced defensive metrics being relatively volatile), but certainly the balance of the evidence makes “Adam Dunn is a below average player” a likely proposition. On what grounds do you rule it out?Report

• Paul McLeod says:

Also, if you “can’t hit a lick” you actually have to be quite good on defense to be average. We’re talking about a phenomenally bad defensive player here, by all indications.Report

5. cb says:

If they had punted and lost, would you be leading the discussion about how the loss counts as evidence that punting was the wrong decision?Report

• Freddie says:

No; as is clear, I am not saying that “this one event demonstrates the folly of going for it.” I am arguing that the urge to dispute several decades of received wisdom and experience is what is driving this sports contrarianism, not a supposedly hyper-rational appreciation of the facts.Report

• Paul McLeod says:

Can’t it be both?Report

• BrianC says:

why do you assume that making the decision to punt in that situation is necessarily due to “several decades of wisdom and experience”? personally, i think the decision to punt there is commonly accepted because it minimizes the risk to the coach’s reputation. if his defense blows it, then he can point the finger at them. if they don’t convert on 4th and then the defense blows it with a short field, everybody points their finger at the coach. the statistical argument is that you should make the decision that gives your team the best chance to win, regardless of the “commonly accepted wisdom” which often incorporates many other sociological factors than simply “what gives us the highest likelihood of winning”.Report

6. jamie says:

I think the football outsider’s point is that the percentages indicate that this was the right call-just because it didn’t work doesn’t mean it was wrong. let’s suppose that every day, you leave work along a footpath instead of along the cliff. you know that leaving along the cliff means that there’s a huge chance-90%-that you slip and fall to your death. one day, you go along the footpath, and you happen to slip and break your neck. there was only a 1% chance of this happening, but hey, shite happens. does this mean your original decision to walk along the footpath was wrong? nope, you increased your chances of winning (surviving) as compared to going along the cliff.
inductive reasoning is flawed in this instance because it doesn’t take into account the percentages of success of the various options. the same can be analogized to football and going for it on 4th down.Report

7. Freddie says:

I think it is worth pointing out– Belichick is, by all accounts, the strong man in the Patriots roster management system, as well. He is functionally the GM as well as the coach, if not by title. If he built a defense that he is sure is incapable of stopping Peyton Manning– despite being successful at stopping him more often than not in that very game– then it is his own fault. Why are his vaunted powers as a defensive genius and the roster he put together himself incapable of defending the Colts?Report

• Freddie says:

And the crickets chirp.Report

• Will says:

You’re changing the subject. We’re talking about the utility of statistical analysis in the context of calling plays, not Belichek’s roster-management skills.Report

• Freddie says:

OK, again– one set of statistics is that the Patriots had quite handily stopped the Colts the majority of the time in that game. Why do you privilege the other set of statistical data? My data is from the self same game. Surely there is no other data that so thoroughly meets the burden of being applicable to that situation than data from that same game.

Again, the sports metrics world reveals its fundamental immaturity. The vast claims of superior predictive power are just utterly unfounded; go around and evaluate all of the season opening predictions of the metrics crowd and compare them to various other people making predictions based on other criteria. The metrics crowd, despite all of their crowing and their self-felating rhetoric, don’t have any consistently superior success to the other kinds of evaluation. If these advanced metrics are so powerful why is that the case?Report

• BCChase says:

At least for baseball you are wrong. Prediction systems like CHONE or from Bill James are often more accurate than more traditional, opinion-driven rating systems. For an example close to home, a lot of traditional decision-makers thought the Royals would win the AL Central this year. None of the numerical prediction systems bought it. I should have listened to the metrics more before I got my hopes up.

I think the more team-based sports like football and basketball are much harder to predict, and so the metrics focus on descriptive stats.Report

• Paul McLeod says:

If you were to construct a truly powerful empirical model (I won’t say “perfect” because any model creator should know that no such thing is possible), those 14 drives would likely be among your very most relevant and important data points…and they would still likely be dwarfed by the preponderance of the evidence. There’s so much data elsewhere that is still highly applicable to this situation that you would be foolish to ignore, even if on a point-by-point basis the other data is not as valuable.

The other thing you’re forgetting is that, based on historical success rates, teams only score touchdowns 53% of the time from the 30. They score touchdowns about 30% of the time after being punted to in that situation. Stretch those numbers as you will based on the game situation and the teams, Belichek was likely increasing the Colts’ chances of scoring by less than a factor of two by failing on fourth. And you have to figure that the Patriots are much more than 50% likely to gain two yards on a play.Report

• Dave says:

If he thought the 9/14 drives in that game were relevant I’m sure he would’ve punted. In his estimation, the fatigue and injuries (2 starting linemen, the “best” pass rushing LB AND his backup) rendered that particular statistic unreliable.

It’s still definitely possible that he was overweighting the three previous drives based on their results (two very fast long drives), but more likely he didn’t think they could perform to their previous level due to those external factors and/or something else he may have noticed about Manning’s play.Report

• BCChase says:

C’mon. You gotta give the commento-sphere more than 4 minutes before so smugly gloating.Report

• BCChase says:

This is where I think the critique “Payton Manning is in his head” gains a little traction. Manning has a big history of late-game heroics, and even when the D had stopped Manning, he was still driving the ball on them fairly well. I think that shaded Belichick’s thinking. If you have some fear of Manning, 2 yards for your offense looks awfully small.Report

8. jj says:

Had Belechick decided to punt the ball and Manning lead a successful drive to win the game, would we be putting the blame on his decision to punt the ball? I suspect not, even though one could argue, that would be abandoning induction. The talk would be the defense’s last minute fourth quarter let down. Why can’t this be a case of the offense’s last minute fourth quarter let down rather than Belechick’s decision.Report

• Michael Drew says:

Actually, I think the talk would have been of Peyton Manning’s unbelievable 21-point fourth quarter come-from behind victory, a legend is made etc. (and oh by the way the Patriots’ D needs to play 60 minutes, what’s the deal with their conditioning?)!Report

9. RobF says:

One of the benefits of statistical analysis is that it allows us to rise above the single data point so that we can evaluate the aggregate data. The assumption is that an experiment run many times is more valuable than an experiment run once.

Bill Barnwell is not arguing against induction. He is arguing against the kind of lazy, data-impoverished induction that would seeks to draw broad conclusions based on a single outcome. You seem to be arguing that the most recent outcome should be given disproportionate preference over the set of all outcomes.

If your approach is superior to his, then we really don’t need large epidemiological studies to know whether cigarette smoking causes lung cancer. All we need to do is ask the next smoker we meet whether or not they have lung cancer. If they don’t, then your inductive logic will lead us to conclude the smoking must safe.Report

• Freddie says:

Bill Barnwell is not arguing against induction

Quote: “you cannot judge decisions by their outcome.”Report

• Jaybird says:

It worked in theory! If you had actually *READ* the playbook, you’d see that they didn’t apply the theory correctly! There weren’t enough coaches making sure that the players applied themselves! At least he wanted to win!Report

• BCChase says:

That’s not what Barnwell said in the original article, that is you paraphrasing him incorrectly. He said “you can’t judge Belichick’s decision by the fact that it didn’t work.” A decision is a single instance, and single piece of data. And you cannot always judge from one piece of data whether a decision gave you the highest chance of success or not. This is not a discussion of absolutes, but of proportions. And Belichick’s failure does not change what the proportions of success were going in. You are reading absolutes of decision-making into it when I don’t think they are there.Report

• Freddie says:

Excuse me, the above is a direct quote. Follow the link before you accuse me of misquoting someone. Seriously.Report

• BCChase says:

Sorry. Was focusing on the bolded quote. The analysis that we are talking about proportions of success and not absolutes stands, though.Report

• BCChase says:

Clearly I got going a little too fast here. Apologies for the faulty accusation.Report

• sidereal says:

Quote: “you cannot judge decisions by their outcome.”

Yes. But not Quote: “you cannot incorporate decisions into a body of evidence by their outcome”, which is what you seem to be arguing Barnwell wrote. Which he didn’t.

You have some stronger arguments and weaker ones in your shotgun attack on statistical analysis in the article and following comments. ‘Barnwell doesn’t believe in inductive analysis’ is, by far, your weakest. So I recommend you abandon that one.Report

• Freddie says:

Quote: “you cannot judge decisions based on outcomes.”

I’m sorry it’s inconvenient for all of you that he said it, but he said it.Report

• BCChase says:

Perhaps a better statement would be “you cannot judge single decisions based on outcomes.” No one outcome invalidates the analysis of the proportions leading to success that went into the decision, especially when there is a lot of chance involved in the execution. I think that is closer to what he means, I agree with sidereal that you are applying that quote too broadly.Report

• sidereal says:

It’s honestly astounding that you won’t let this one go. You know what he means, right? That because you hit on a 12 and got a King and busted doesn’t mean it was the wrong play to hit on a 12? That’s what he’s saying. You know that’s what he’s saying. And yet you’re still going at it.Report

• BCChase says:

Just for emphasis: this is the correct jist of it.Report

• Jay Daniel says:

I don’t know why I’m bothering responding to this comment, since you’ve ignored the points of every other commenter. But you must be sinking to the level of “annoying sports argument stubbornness” on purpose here. I can’t come up with any other explanation for why someone — who is clearly very intelligent — would utterly refuse to concede that you are mischaracterizing Barnwell’s argument. It may be that statistical analysis is less good at predicting sports outcomes than Barnwell implicitly believes. But anyone with a passing familiarity with statistical analysis would recognize that he is employing statistical best practices here.Report

• Freddie says:

I think that rational human beings evaluate decisions based on the outcome of those decisions– not entirely, not without acknowledging error, but to some large part, they do. They do because this is frankly one of the only ways to make considered judgments about human behavior. And I think that this general schema cannot be dismissed entirely in this instance simply because it is to the benefit of the people who are so adamant about asserting their own unconventional bona fides.

Yes, I get it– you are all brilliant iconoclasts far too free-thinking and out-of-the-box to be hemmed in by the Man and his antiquated notions of “punting” and “playing defense”. I am suitably impressed. I continue to believe that going for it on fourth and 2 from the shadow of your own end zone was the wrong decision. I don’t claim, unlike my interlocutors, to have some sort of unique access to the capital-T truth on the matter, nor do I pretend that I am accessing some hyper-rational computer knowledge that demonstrates that my opinion is correct and that everyone who disagrees with me is the enemy of rationality.

Like all attempts to leverage particular viewpoints with appeals to some sort of perfect or non-situated rationality, sports metrics rely ultimately on assumptions, conjecture, the privileging of certain data over others, human contingency, human error and more than anything else, the vast bias towards seeing what you want to see in the data. Does that make it useless? Absolutely not. Does that make claims that any consideration of what vast numbers of coaches, scouts and players are saying is some kind of obstinacy or clinging to outmoded traditionalism a farce? Yes. I believe it does. Take from that what will; I’m sure you’ll devise a set of statistics that you’re certain demonstrates that I’m wrong.Report

• BCChase says:

Rational humans evaluate decisions based on their outcome, but also based on their experiences of multiple past outcomes. Just because something bad happens once, does not mean that the decision making process that went into it was faulty. Just because I busted my knee once on a freak accident, does not mean that regular exercise and sports was a bad idea. Often, we make decisions based on one instance and it is to our detriment. If a predator drone has been generally successful at killing terrorists, but in one instance kills a lot of innocents, should we stop? By your logic, yes, that one outcome can be placed above all others. Whereas a lot of foreign policy experts I read through this site and otherwise say that unfortunately we can’t, because in the other, preponderance of examples, the predator drones are helpful.

In the sciences, decisions about research directions and diagnoses of protein behavior are always based on multiple experiments, a preponderance of data, because one experiment can always be wrong. Nothing is called a scientific “fact” until there is a lot of data behind it, with error bars affixed. I don’t view myself as smarter than a football coach, or somehow above the analysts I watch on TV. I just wonder why the same data analysis and statistical analysis that drives science can’t be applied to sports. Is that making me an iconoclast somehow?Report

10. Freddie says:

Please note that I’m a fan of being a bit of a punk while arguing. Especially about things that are low stakes like sports.Report

• Jay Daniel says:

Noted. Recalibrating now.Report

11. jamie says:

Freddie-you’re cherry-picking the data set. Yes, the Pats had stopped the Colts for a lot of that game; but they hadn’t on the last two Colt drives, and the Pats defense was, by all accounts, gassed. Peyton is also a legendarily clutch QB that thrives on these moments. There are arguments going both ways based on this game in particular.

The point the statisticians are trying to make is that all things considered, based upon the vast amount of data we have on all football games, the safer play is to go for it on 4th and short. Yes, there could be confounding variables-if your QB’s leg is broken, then maybe you should punt-but in general, stats give us something that is more accurate than your intuition. This is not controversial in most spheres of social science.Report

• Freddie says:

And they are cherry-picking too. The difference is, one group insists that they have unerring access to “rationality,” and one does not.Report

• Grunthos says:

This is the claim that leads you astray, Freddie. They aren’t insisting they have unerring access to “rationality”. They are insisting that access to rationality is improved when you pay attention to the larger world of less-immediate data as well as the small world of the particular situation analyzed. You are the one insisting that the statisticians must be judged on the motives you attribute to them and not their arguments, rather like a partisan hack insisting that policy arguments are “irrelevant” or “inconsistent” because they come from a neutral study group.

It’s not all about you, Freddie. Sorry to be the latest bearer of that news.Report

12. jamie says:

I doubt that they think they have “unerring” access to rationality. But please explain how they’re cherry picking by compiling data from a vast number of football games?Report

• Freddie says:

Because they are choosing what to compile, and they are choosing the relevance of the various pieces of data they are compiling.Report

• jamie says:

ahahahaha now I know you’ve gone off the deep end. This describes every statistical effort, ever. You have to explain why their choices are WRONG in order to disprove them. Every statistician makes choices; it’s a necessity in order to create data. Unless you’ve just gone totally pomo on us and are neutral in the battle between cw and stat-heads. In which case, don’t defend conventional wisdom either, because it, too, makes decisions based on evidence, and decides what evidence to compile and its relevance. Inductive reasoning chooses to privilege evidence that appears only after the fact, for example.Report

13. Alex Massie says:

OK, saying you “can’t” judge Belichick’s decision because it didn’t pay-off isn’t an ideal way of putting it. Saying “it’s unwise” to judge it just because “in this instance” it didn’t work might work better.

But: at the moment Belichick took the decision, what did he know? He knew, for sure, that if he went for it and got the first down the Pats win the game. If he goes for it and the Pats are stopped, they *might* lose the game. If they punt they *might* still lose the game. Of course, in both these instances the Pats defense *might* stop the Colts and win the game.

But if you punt three things could happen:

1. The Pats stop the Colts anyway.
2. The punt is returned for a TD and the Pats have enough time left to score themselves.
3. Regardless of where the Colts begin their possession, they march down the field and score, leaving the Pats insufficient time left to score themselves.

Of these 3 is clearly the worst outcome for the Pats.

Combine that with the certainty of winning the game if they get one more first down and it seems to me that the balance of probability – and the limits of forecasting – support the decision to go for the first down.Report

14. Freddie says:

OK, OK. Let me restate my position. I believe that the Patriots had the best chance to win by punting it deep and letting their defense play. I believe both Peyton Manning’s abilities and the Patriots defense’s inability to stop him have been widely exaggerated in this discussion, and particularly his ability to cover 70 yards in the two minute drill. I certainly think that the fact that the Patriots defense did indeed stop the Colts on many possessions in that game is germane to the question at hand, and I find that the people arguing that the Colts had “momentum” are likely people who would dispute the very idea of momentum when doing so is flattering to their preconceptions. And, yes, I think the fact that the Patriots did indeed go on to lose the game in precisely the way that anyone who would have advised against going for it could have predicted is indeed a fair and valuable metric for evaluating his decision. I really do. Is the decision, overall, debatable? Yes, it’s debatable. But in this instance, exactly what Bill Belichick did not want to happen, happened, and to say that totally ignoring that reality is not only smart but obviously so seems to me to be the product of a mindset that privileges iconoclasm, and sneering at the other side, over utility.

I have said and will say again that if the metrics crowd was more inclined to self-criticism, and to admitting that they have their own set of biases, blinders and faulty assumptions, they would be vastly more credible in their opinions. Instead, as a movement they are incredibly patronizing and evangelical, a really odious combination.

I would further like to point out that I was responding to a post by Will which was gushing towards Bill Belichick– the man who assembled this roster, coached them for all of training camp and the majority of a football season, and yet was still unable to get them to stop Peyton Manning when it mattered most. Defending this decision as a way to defend Belivchick is very, very strange to me. Perhaps the defensive genius and brilliant football mind has been overrated for some time, both in his ability to build and direct a defense, and in his willingness to flout the conventional wisdom in the way his fans enjoy so far out of proportion with its success. End communication.Report

• Freddie says:

I believe that I indeed say “indeed” too much.Report

• Jay Daniel says:

Your argument seems to boil down to this:

in this instance, exactly what Bill Belichick did not want to happen, happened, and to say that totally ignoring that reality is not only smart but obviously so seems to me to be the product of a mindset that privileges iconoclasm, and sneering at the other side, over utility.

This is dumb, dumb, dumb. Here’s a counterfactual for you that proves it: I was watching my team three weeks ago. They had a 4-point lead, the ball on their 18 yard line, and they had a 4th and 1 with about 2 minutes left. They punted the ball, the returner gained 30 yards on a 50 yard punt, and the opposing team proceeded to score a TD and win the game. Exactly what I did not want to happen, happened. Therefore, by inductive reasoning, the decision was a stupid one. Right? Am I following you?

I’m not in the sports’ metrics crowd, so when it comes to what I’m going to say next, I don’t really care how credible they are to you (although, strange how your argument against them sounds like your arguments against conservatives… and my own protestation mirrors Conor. blech.) My own opinions come from watching NFL football my entire life (I’m guessing we have similar bases for our opinions), and enduring a lifetime of pain and heartbreak following my teams’ coaches running the ball up the middle for 3-and-outs with the lead with only a few minutes left in the game, proceeding to punt the ball away, and then promptly losing (just like 2 weeks ago). In my admittedly anecdotal experience, this happens roughly 100% of time. It’s up there with the prevent defense (also guaranteed to prevent a win 100% of the time) on the Mt. Olympus of my most-hated conventional coaching moves. By using inductive reasoning, I have determined that both are stupid practices under most circumstances. So yeah, I love it that Bill Belicheck doesn’t coach that way. I also note that his decision was pretty consistent with his general coaching practices, and they have seemed to work pretty well for him over time, and in fact better than ANY OTHER ACTIVE NFL COACH. So maybe he likes inductive reasoning too.Report

• Freddie says:

But of course, that is not at all what my argument boils down to. But strawman away, if that suits you. One thing that is certain is that the response in this space demonstrates that my larger understanding of the way your movement argues and acts is entirely correct.Report

• Jay Daniel says:

You’re probably right; your comment/essay was too long and meandering to be able to boil down in a blockquote. But that was certainly one of your points, it’s related to your overall failure to understand how an individual outcome should be used to evaluate a discrete decision, and it is still dumb. But don’t ever let other people’s arguments get in the way of making sure your prec0nceived notions get confirmed.Report

15. Freddie says:

Also– I am clearly guilty of overstating my own case in the process of arguing it, which is exactly the failing that I am identifying in the people who are arguing against me. Which, you know, is a character flaw, and hypocritical of me. I’m working on it; you’ll all just have to be patient.Report

• BCChase says:

You forgive me my faulty misquote accusation, I’ll forgive you your hypocrisy. Then we can limply shake each other’s hand, Belichick-style, and go home.Report

16. Josh says:

Okay, here are some of the problems in this post.
1) Show me a serious sabermetrician that will say “A walk is as good as a hit on average”. Sabermetrics has shown that walks were more valuable (led to more runs/wins) than most traditional baseball people thought. And of course, the value of a walk and a hit changes depending on the situation, which no rational person would deny.
2) What do you mean, “knows more” about baseball? He knows how to play baseball more than the FJM guys. He knows what it’s like to experience the game more than them. He knows what it is like to be in those situations. So what? I don’t really get your point here.
3) If a choice usually leads to a good outcome, but fails occasionally due to the vagaries of fate, was it a bad decision? No, it was simply bad luck. If the Patriots have a higher probability, based on aggregating the data from thousands of past experiences, of (converting the 4th down or stopping the colts from driving 30 yards) than (stopping the colts from driving 65-70 yards) than the fact that it failed once does not make it a bad decision. Sometimes, you make the right decision, then shit happens. DePo does a good job making this point. http://itmightbedangerous.blogspot.com/2008/06/draft-review-about-process.html
4.) Um, yes conventional wisdom is often challenged, and it is often true. For example, it used to be conventional wisdom that guys like Phil Rizzuto and Nellie Fox were valuable because they played defense. That conventional wisdom was challenged by “contrarian sabermetricians” before they did the work and found out, yes, a good defensive middle infielder is very valuable. However, conventional wisdom is not always true-using FO as an example, the conventional wisdom in football is/was “you need a good running game to win, when in reality a good passing game is now more important. But the only way to figure out if the conventional wisdom is correct is to challenge it.Report

• Freddie says:

But the choices in the metrics used to challenge the conventional wisdom are full of debatable and contingent assumptions at every point.Report

• Josh says:

Yes, but that’s no reason to throw our hands up in the air and blame it on the gods of clutch, or whatever the conventional wisdom is at the time. The fact is, we can point to similarities between this situation and other situations, and we shouldn’t reject systematization out of hand because there is some contingency there. To bring it back to one of your arguments, if we did that, we could never function in the real world.
But I’m curious, what are the debatable assumptions in the metrics here? That the conversion % in the Patriots-Colts game is analogous to the conversion % in other situations? Or am I missing something?Report

• Freddie says:

Exactly that, for one. Another is simply acknowledging that there are people who can come up with similarly complicated metrics that would indicate that punting was the obvious choice.Report

• Josh says:

I actually agree with that point. I just disagree with using results-based analysis to say what you said.Report

• Robbie says:

I still want to know what sabermetrician thinks a walk is as good as a hit. I don’t think I know of one.Report

17. cdoyle says:

At one point in time — it wasn’t that long ago, really — every doctor in America treated maladies by attempting to “bleed” them out of patients. This was the way things were done.

The only problem with appealing to authority is you’re right until everyone else realizes you’re not, and by that time it’s too late to save your job.Report

• Freddie says:

This comment is nonsensical, fails to respond to any of my points, and in general does not meet even the basic requirements of amounting to an argument.Report

• cdoyle says:

What points? You assigned a single personality to a large, diverse group of people, gave quotes out of context, and betrayed a lack of understanding for the most basic concepts of statistical analysis. My comment was an attempt to point out that, unfortunately, there are always lots of people willing to pipe up for orthodoxy, regardless of evidence presented to the contrary. I didn’t mean it to be a point-by-point refutation of what you wrote, because I have no desire to bring myself down to your rhetorical level, and let you beat me via experience.Report

• Freddie says:

Try this one on for size: when did I ever make an appeal to authority? And how on God’s green earth is appealing to the authority of the various sabremeticians any different or any better? Again, you all keep confirming exactly what I am accusing you of: you are possessed of absolute certainty in your beliefs, you refuse to question the assumptions undergirding those beliefs, and you respond to contrary opinion by losing your shit and saying “you just don’t understand”. Heal yourself, doctor C.Report

• BCChase says:

Statistical analysis is one of the bedrocks of the scientific method. I believe as strongly as I believe anything that well-done statistical analysis in science can tell us something about our world. I don’t think that stance is all that controversial. But that is all that most of the objections to your post amount to: a defense of statistical analysis. And I think you are reading a personality into the objecting commenters en masse that is not there. I am fully willing to say that given all the information Belichick had he may have made the wrong choice. That’s different from attacking a belief in statistical analysis as unfounded.Report

18. Andy says:

Freddie, What would you state the odds are of the following?
1) Patriots making 4th and 2 at their own 28.
2) Colts scoring a TD following a punt (roughly at the Colts 32 yard line).
3) Colts scoring a TD from the Patriots 28.Report

• Freddie says:

I am not in the habit of inventing odds and pretending that they have the authority of mathematics, unlike most of the people in this thread. I think the Patriots had a better chance to win had they punted, particularly given the fact that the Patriots had already successfully defended against the Colts on nine of fourteen possessions. Unlike all of the people arguing here, I acknowledge that I could be wrong.Report

• Andy says:

Freddie,
The Colts had 4 out of 13 drives that were more than 70 yards (the 4 touchdowns).
The Colts had 9 out of 13 drives that were less than 29 yards.
The Colts had no drives between 29 yards and 70 yards.
Therefore, there was no difference between punting and not punting, right?
It’s not that you won’t admit you are wrong. When I saw the play, I thought it was crazy too. It’s that you are condemning an entire method (statistics) without any justification.Report

• Keith says:

I find it funny that you say the Patriots “successfully” defended the Colts on nine of fourteen possessions. In that game the Colts scored on roughly 64% of their possessions… and if you read Peter King’s column, by his account the Colts offense was having a “bad” night.

Without even discussing the stats side of this debate, which has been covered ad nauseam, here are some logical points defending Belichick and his decision that went against the “conventional wisdom.”

1) His opponent is quite efficient on offense. As noted above, the Colts scored on 64% of their possessions on what was considered a bad offensive night for them. Further, they had been gaining momentum and were playing at home – the Colts just scored in under 2 minutes and had forced a 3 and out. My hunch is that had the Patriots been playing the Ravens, or even the Steelers, then Belichick would have punted. In short, this decision could be argued as very opponent determinative as much as it was statistically based. Also, recall the Dolphins game where Peyton scored and left the Dolphins with no time to counter. Perhaps Belichick looked at it this way: if we make it, the game is over; if we don’t make it, they still have to score and if they do score it should take less time, giving us another crack at it. Think about how incredibly good Peyton Manning is: in the post game press conference he commented how he told the offense not to score too quickly! Seriously, think about that. That offense is so confident and so good that they believed they could milk time off the clock and decide when they wanted to score. Personally, I think this argument would hold more water if they had just let Addai score, leaving about a minute left for the Patriots to get into FG range.

2) The strongest unit on the Patriots is their offense. Wouldn’t you prefer to have your best unit on the field with the game on the line? Tom Brady is one of, if not the, best QBs in football and the Patriots offense makes a living off the short passing game, which essentially doubles as their running attack. With weapons like Randy Moss, Wes Welker, and Kevin Faulk you have to like their chances of getting 2 yards against a spread out Colts defense. No one knows his team better than Bill Belichick and it could be argued that he made the decision to win or lose the game with his strongest unit, his offense, rather than his weaker units, defense and special teams.

3) Rule changes. The conventional wisdom may be outdated. Remember the conventional wisdom developed many years before rule changes that benefitted QBs and WRs. Therefore, you could argue that Belichick’s decision takes this into consideration in two ways: 1- He is looking to exploit these rule changes to his advantage on offense (which basically explains how the Patriots’ offense has evolved to what it is today) and 2- He is attempting to prevent the Colts from taking advantage of these rule changes. The bottom line is that the recent rule changes tip the scales in favor of the offense, even if it is ever so slight. Obviously, had Belichick talked Bill Parcells into such a move when he was the Giants DC in the 80’s this would have been a pretty dumb decision, and not only because the Giants’ defense was their strongest unit. This conventional wisdom developed in a different era and it is very important to realize this.

I would also like to note one of the biggest reasons why almost every other coach would have punted in that situation: CYA (cover your ass.) We all know the conventional wisdom is to punt the ball in that situation. However, Belichick knows there isn’t any way Kraft fires him and even if he did, he would get another job in about .5 seconds. I feel like I have made a compelling case that Belichick made a rational decision based on logic alone and not just statistics. Now, I have no idea how many other coaches would even employ such outside the box thinking; I suspect most would just robotically punt the ball. But there is a reason why Belichick has 5 Super Bowl rings. The guy is a creative thinker.

Anyway, if you read Joe Posnanski’s article on SI.com you will see that, according to his formulas, the Pats had a 78% chance of winning by going for it and a 70% chance of winning by punting. To be honest, I think we’re splitting hairs over this decision and I think both sides are being ridiculous by claiming that there is an inherent right or wrong decision here. Even if we assume that Joe Pos’s #’s are 100% correct, it’s still pretty close and I still like my chances of winning if my win probability is 70%. That being said, I admire Belichick for “playing to win the game” and going for it, since you could pretty much guarantee victory by converting the 4th and 2.

Bottom line, both sides of this debate can make a strong argument that they are “right.” I don’t think Belichick was “wrong” for bucking conventional wisdom or because his decision didn’t work out. Nor do I believe his process was inherently right because he (seemingly) relied on statistical analysis to make his decision. There is something to be said for punting the ball away and increasing the degree of difficulty for Peyton Manning and the Colts. However, there is also something to be said for having confidence in a two-time Super Bowl MVP gaining 2 yards.

In short, I think the media is being ridiculous for killing Belichick over his “wrong” decision and I also think the statistically inclined are being just as absurd for ridiculing the media by claiming Belichick’s process was “right.”Report

19. Mr Falcon says:

This entire argument is ridiculous. Good decisions sometimes lead to bad results.

If there wasn’t the possibility of a bad result it wouldn’t be a “decision.” All the decision-maker can do is use the best information available to take the course of action with the highest percentage of success. It’s irrelevant whether this information is based on ‘conventional wisdom’ or ‘metrics’ (stats are usually better bc emotion is somewhat removed, but this is a bit of a false dichotomy bc most conventional wisdom is based on a large collective experience which is, in itself, a ‘metric’).Report

20. rfs1962 says:

I like the decision to go for it. Suppose the Patriots kick it and do in fact have a 9-in-14 chance of preventing a touchdown. That’s a 64 percent chance of winning the game. If their chance of making a first down is greater than 64 percent, the call is correct. But, as has been pointed out, if they fail they can still win by making a stop, so their chance is actually greater than their odds of succeeding on fourth-and-2. Most NFL coaches hate hate hate having a game come down to one play that involves a true decision by a coach. I applaud Belichick for looking at whatever he looked at and taking the appropriate risk.Report

• rfs1962 says:

Or, what Keith said.Report

• Mac says:

The part that was hard to swallow for most is that both the chance of gaining a 1st down and the chance of stopping them from their end were both lower than punting and defending, but if you add both together, they may be slightly a better chance than punting. I say may be because we don’t have the stats to calculate the probability to that precision.

In this case, using the available data and assuming there is going to be a lot of error in the assumptions made, it is really a wash because there were no stats to adjust all of the variables and when you calculate averages, there was not much difference between the two.

I think the lesson to be learned hear is that collective wisdom can also be misapplied (the general idea of always punting from the 20 is because a failure almost guarantees 3 points, which was irrelevant in this case) as can statistics (I’ve seen many people use average occurrences to stand as absolute certainties in this situation).Report

21. Nick says:

So if Belichick had called for a punt, and the Colts blocked it, then that would mean punting was the wrong decision? Asinine. The logic that “the result matters” completely caves in on itself unless you can prove that punting the ball away is a 100% guarantee of victory.Report

22. Jordan says:

23. this claim amounts to saying that we should abandon induction as a tool for evaluating choices.

No, it amounts to saying that there’s little value in generalizing from one example. It’s as if I pointed to this piece and said that Freddie never writes anything sensible.Report

24. Michael Drew says:

Here’s something else being overlooked: this is not a binary decision. On the one side, yes, there is pretty much just one route if you choose it: because this would have been a punt from deep in his own territory, there was pretty much just one way for Belichick to go about it: “Okay Robo-Leg, go out there and pound the laces off it!”

On the other hand, the “go-for-it” option is not a simple on-off choice: you have your entire playbook at hand to apply to the question of how you go for it. Therefore, it seems like a strange point of analysis to leave off at the decision whether to punt or go for it and look at the ensuing events as governed by chance (even if it is chance as governed by probabilities revealed in the results of all past comparable decisions). Rather, the events continued to be governed primarily by Bill Belichick (if he called the play), and then Tom Brady (depending on how much latitude the play allowed him to choose receivers), and to a lesser extent, Kevin Faulk (depending on how much discreion he had in how to run the route). To my eye, the choices made after the decision to go for it by the combination of those men (and to some extent other Patriots) clearly drove the outcome, and were clearly not sufficient to achieve the desired outcome, and could never have been.

Here’s why. Whether the play called for Brady without variance to pass to Faulk or he made that choice, whether the play called for Faulk to turn for the ball precisely at the 30 yard line and for Brady to deliver it there or some degree of discretion was granted to Faulk, one thing is obvious to me: the circumstance called, once the decision had been made a) to go for it and b) to pass, for the receiver to be at least 1-2 yards past the yardline needed for the first down when the ball arrives. Clearly, the defense will be defending the first down line with all available vigilance, and one can count on being tackled immediately if one receives the ball there, likely backward in the way Faulk was, as the defender will be rushing up to that line from his position (only linemen line up less than a yard-and-a-half from the line of scrimmage). In that case, just as occurred, in all likelihood the receiver will fall backward, and the actual final position of the ball will be well behind the first-down marker. As such you will be entirely dependent on being granted every inch of your receiver’s forward progress by the referees in order to get the first down. When the yardline needed is precisely the 30 (the series having started after a touchback), the refs then essentially would have to simply decide to hand you the first down, and thus the game. I can see no reason to make that such a likely outcome by throwing just to the yardline needed and no further. Are the completion statistics on fourth-and-two pass attempts that much better for two-yard throws versus three- or four-yard throws? They can’t be, as they aren’t kept: only the total yardage (pass+run-after-catch) of plays is kept that I am aware of.

I don’t know the specifics of the play that Belichick called (i.e. to what degree it was not executed correctly), but clearly the above was not communicated to the players a way that ensured they heeded it. In my view, this makes the decision-making by Belichick poor, as there is simply no case for the play that was called, and Belichick clearly could have been, if he wasn’t in fact, the one to make it.

All of this goes to resist the notion that “going for it” in those circumstances can be said in itself to be a justifiable decision. That cannot be true when the events that determine whether the decision to go for it proves to be a good one are not outside the influence of the same decision maker who makes the decision. Examining the decision not to punt separately from the decision of what other play to call (and what finer-grain instructions to give to the team) is an arbitrary isolation of one aspect of what is in fact a fully integrated set of linked decisions. The play that was called was a bad decision; we know this because the play essentially succeeded (the pass was completed), but it was not equal to the requirements of the situation.Report

• Mark Harrison says:

That argument may have some grounds if it wasn’t totally bogus. Faulk did catch the ball past the 30 yard line (or at least thats where he first got his hands on it). If he had caught it cleanly it would have been a first down without a doubt. As it was even with the slight juggle it was still a pretty poor call be the umps. If the Pats had had a challenge they would have quite possibly won (its close but on challenging the spot they certainly had enough of a case that the actual spot they got wasn’t correct).

As for the overall debate, on some occasions Freddie you seem to suggest that people have justified going for it by using their statistical methods to manipulate the data to back up their opinions. Surely if their methods are devised prior to the event happening then this accusation holds no water? obviously people on here and so on can then pick and chose the methods they use to defend their opinion but Belichik can’t be accused of that. He has a method for doing things and over the long haul its proven very successful.

Aside from any statistical way of looking at things I personally would be far more happy to go for it in that situation. The colts were moving the ball pretty well and the pats were looking gassed.

Most of the statistical arguments suggest the decision is about 50/50, given this doesn’t it suggest that the decision was in fact based on footballing instinct as much as the arguments made using numbers? I guess people could claim ego made him go for it wanting people to say how smart he is but the impression I’ve always got is that his main motivation is a desire to win and I can’t see that he would have gone against what he thought gave them the best chance.

It would be very interesting to see the media’s reaction had they indeed got the first down. I suspect that the majority who are slating the decision would be praising its gutsy-ness and forward thinking.Report

• Michael Drew says:

Then I simply amend to say the pass needed to be 2-4 or more yards past the 30 — enough to avoid being tackled back over the 30. I didn’t even see the bobble, but refs routinely don’t give the precise point where the catch is made when a receiver is immediately tackled backward; they just assume some interval occurred between the ball hitting the hands and possession being established. The point is the play was executed (and, by appearances was drawn up, but clearly that is a crucial question wrt Belichick’s decisionmaking) in a way that left the result very likely in the refs discretion. If you’re gonna take the chance on passing, I’m saying, then leave no doubt. Me personally, I’d have max-protected and run or even delay-sneaked (even with Brady).Report

25. mike says:

Why does the double-play combo matter with two outs?

No one actually thinks a walk is as good as a hit. That’s what little league coaches say to their hitters when the pitcher is having trouble throwing strikes.Report

26. mark says:

This column reeks of ignorance. Ignorance aside, you fail to consider the fallacy of judging the Colts scoring chances based on their previous 14 drives. Had the Patriots punted, the Colts would have been operating in 4 down mode which would have increased the likelihood of a touchdown.

Even if you throw out “complex” statistics (which I wouldn’t), you have to figure what Bill was thinking was that the Patriots had a better than 50% shot of making the first down and the Colts had a better than 50% chance of scoring the touchdown, which, as statistics showed was not a bad assumption. The decision was proper–at worst its a coin flip where he controls his own destiny. At best, he is making a brilliant decision that failed a small percentage of the time.Report

27. Tom N. says:

Freddie, you’re vilifying Football Outsiders simply as a point of semantics. Clearly, the point the FO guys were trying to make is that you don’t judge a decision based on a sample of 1 play. You judge that decision after you have a large enough sample to eliminate a lot of the luck from the situation. If that play had happened 100 times, I’m sure the FO guys would have said that the result is important. But it was one play. The result of one play is not relevant. The result of dozens or hundreds of plays IS relevant. My guess is that both you and FO would agree with that. But you’re focusing ont he wrong part of the argument.

Let me give another example. Over many years, the Yankees have brought in Mariano Rivera to close hundreds of games. He’s one of the best, if not THE best relief pitcher in baseball. Bringing him in to close a game is as close to being a no-brainer decision as you’ll find in all of sports. It has been a successful decision hundreds of times in the past, at a very high percentage.

Now, let’s say Rivera comes into a 1-run game, and allows two runs to lose the game. Did Girardi make a mistake by bringing in Rivera? Of course not. The result of that game was not important. It was the process that was important. Girardi made a decision that maximized his team’s chances of winning. Just because it happened not to work this one time, doesn’t mean it was the wrong decision.

I also wanted to point out that Sean Payton of the Saints goes for it on 4th and short very often (though usually in the opponent’s half of the field). Sometimes it works and sometimes it doesn’t. But it works often enough that the Saints have probably scored more points and won more games over the years than they wouldhave if they had gone the “safe” route and punted or kicked FGs ever time.

Also, what if the Patriots had punted away and the Colts had still engineered a game-winning drive? Would you be saying “The Patriots made the wrong decision, they should have gone for it on 4th down!”?

Next, you say that the Patriots had stopped the Colts on a majority of drives, so the Patriots should have punted. But you’re ignoring the other relevant factors. Namely, the odds of the Patriots winning if they go for it and convert and the odds of the Patriots winning if they go for it and don’t convert. The equations calculating the odds are as follows:

Odds of winning by going for it = (Odds of getting a first down * Odds of winning the game if you get a first down) + (Odds of failing to get a first down * Odds of the Colts scoring a TD from 30 yards out)

Odds of winning by punting = Odds of Colts scoring a touchdown from about 70 yards out

If the result of the first equation is greater than the result of the second equation, you go for it. If not, you punt. The problem is coming up with values for those variables. The only things you know outright are that the odds of winning if converting the 4th down are close to 100%, and the odds of the Colts scoring a TD from 30 yards out is greater than the odds of the Colts scoring a TD from 70 yards out

Brian Burke of advancednflstats.com looked at the actual results of actual NFL games over several years (no teoretical percentages. All of it was actual, observed data) and found that teams on average convert 4th and 2 about 60% of the time, teams score from 30 yards out with 2 minutes left inthe game 53% of the time, and teams score from 70 yards out with 2 minutes left in the game 30% of the time (which supports your “The Patriots would stop them a majority of the time argument).

Now, obviously that is the average of all teams put in those positions, and you can’t use those exact numbers to calculate the odds. Every situation is different, for various reasons (the qualities of the offenses and defenses, the weather, the fatigue of the players, etc.) But they’re baseline numbers around which you can make estimates.

I used the equations above and plugged in various different combination of odds of converting the 4th down, and of the Colts scoring from both 30 yards out and 70 yards out. In virtually all of them, the odds of winning when going for it were greater than the odds of winning when punting. For instance, let’s say the Patriots didn’t have a 60% chance of converting, but only a 40% chance of converting. And let’s say the Colts had a 50% chance of scoring from 70 yards out. The Colts odds of scoring from 30 yards out would need to be 85% to make punting the right choice! And even then, it’s only a 1% advantage.

Clearly, nobody knows exactly what the odds are, but for anybody to say that Belichick DEFINITELY should have punted is ridiculous. It was a close call either way.

Lastly, I think you’re right that experience needs to count for something. But experience is not infallible. Most conventional wisdom was derived decades ago, and things change over time. There was a time when conventional wisdom said that a series of short, timing-based passing routes would never work. But Bill Walsh experimented and discovered that the conventional wisdom was wrong. His “West Coast Offense” can and does work. There was a time when running the ball out a shotgun formation was unheard of. Now teams do it all the time, and do so successfully. Conventional wisdom said that something like the Wildcat formation would never work. Then Bill Parcells (a football traditionalist if there ever was one) and tony Sparano tried it out and it helped a 1-15 team improve to a division champion 11-5 team.

Conventional wisdom finds itself outdated often, because the game changes over time. The same things that worked 50 years won’t necessarily work now, just like how the things that work now might not have worked 50 years ago.

Lastly, I think you need to consider another factor that comes into play as far as coaching decisions go, and that’s the desire to defer blame. Every coach can be fired at any time, and I think many coaches make decisions to avoid criticism from the fans and the media. The safe, timid play call defers blame to the players. “Hey, I put the game in their hands, they just didn’t execute”. I don’t necessarily think coaches do this on purpose, but their job security (or lack thereof) has to affect them at least subliminally. It happens all the time, where a team is down 21-0 in the 4th quarter and they kick a FG instead of going for it, or a 1-9 team punts on 4th and 1 from the opponents 40 yard line.

Oh, one last thing and then I promise I’m finished. Whoever said that stats people think Adam Dunn’s defensive liabilities outweigh his offensive contributions is outright wrong. It may bite into his offensive contribution, but his offense clearly outweighs his defense under ANY advanced statistical system.Report

28. Tom N. says:

Sorry, I made a little mistake int he equations, they should say:

Odds of winning by going for it = (Odds of getting a first down * Odds of winning the game if you get a first down) + (Odds of failing to get a first down * Odds of the Colts NOT scoring a TD from 30 yards out)

Odds of winning by punting = Odds of Colts NOT scoring a touchdown from about 70 yards outReport

29. Vidor says:

What a terrible essay. Of course the Football Outsiders column was talking about evaluating INDIVIDUAL decisions based on their outcomes. If ten people make Decision A because the evidence shows that 90% of the time A is the correct decision, then it would be ridiculous to say Person 8 made the wrong decision because things turned out badly for him while Persons 1-7 and 9-10 got the good outcome that they had every reason to expect. In fact Football Outsiders, Bill Belichick, and the rest are in fact using inductive reasoning from all the past results of short yardage situations, points allowed after punting, etc. The writer, for his part, bases his argument on nothing; note how much he uses “I think” and “I believe”.Report