Coronavirus / Culture / Feature / Health / Science and Technology

Mini-Troughput: The Triple Lindy Stats Flop

by Michael Siegel · May 7, 2020

This week on the Twitters, a mini-controversy erupted over COVID-19 projections. The latest model projections for the toll of Coronavirus indicates that if social distancing is ended, cases will surge to 200,000 a day with 3000 deaths a day by June. However, the White House is skeptical, favoring much more optimistic projections. In response to the story, they tweeted this out:

To better visualize observed data, we also continually update a curve-fitting exercise to summarize COVID-19’s observed trajectory. Particularly with irregular data, curve fitting can improve data visualization. As shown, IHME’s mortality curves have matched the data fairly well. pic.twitter.com/NtJcOdA98R

— CEA (@WhiteHouseCEA) May 5, 2020

That shows the predictions of various models for the progression of the pandemic. The Institute for Health Metrics and Evaluation (IHME) models, which the White House had been favoring, are very optimistic and have come under fire. But even their most recent revision, which does not account for the end of social distancing, predicts that COVID will continue to ravage the country at this level for a few more weeks before slowing fading away (the green line). The one the White House now favors and wants to base policy on is the red line. This is a “cubic model” from economist Kevin Hassett that goes beyond even the optimism of the IHME model to predict that COVID deaths are about to fall dramatically, disappearing within the next couple of weeks.

If you heard a loud noise on Tuesday morning, that was probably the sound of every epidemiologist, every economist, every mathematician, every scientist and every virologist simultaneously face-palming and then crashing to the ground in a dead faint. You may have noticed shares in smelling salt companies surge on the Dow. The reason is because while that “cubic model” fit might look reasonable, it is utter absolute garbage. It is the kind of thing that would get a D if it were turned by a freshman in stats class.

Walk with me a bit.

How Data Becomes Prediction

Science lives in a dizzying sea of numbers that we call “data”. Data are the foundation stones of all science because they are what is real … or rather, what is measured. Data can be misleading. They can be wrong. They can be biased. But they are the tether between the reality of the world and our way of understanding that reality that we call “science”.

In the case of COVID, the data are the number of positive tests, the number of confirmed deaths, the number of hospitalizations. Those facts are a shadowy representation of the underlying reality — how many people have caught the virus, how many have gotten seriously ill and how many have died of it. There has been a great deal of debate about the connection between those two things — whether we are over- or under-counting COVID deaths, how many of the infected are not being tested, etc. These are what we call “biases” in the data. And they’re something you have to account for if you’re to understand what the data are telling you.

Assuming you can understand the biases, you can then use your data to breach the underground chamber that contains the fundamental questions you are trying to answer. In this case: How infectious is the disease? How much exposure does someone need to become sick? How deadly is the disease? How long will it be before someone knows they’re sick and seeks help? How will society respond to the disease? How will that response change from place to place? If you can get answers to those questions, even partial ones, you can start making predictions about what will happen, given certain assumptions.

Models are one of the tools we use to answer fundamental questions. They essentially turn the process around. If we assume certain answers to the fundamental questions, what would the data look like? Can we change those answers until the prediction of the model matches the data we have on hand? These models are not conjured out of the air but are based on prior knowledge … in this case, how previous pandemics have behaved.

Models are a contentious subject in science. My view is that they are useful tools but are slippery when they try to make predictions. The reason is that there are sometimes many models that can correctly “predict” what has already happened. The make-or-break point is whether they correctly predict the future. And the scientific literature is liberally littered with models that explained the past perfectly but then crashed and burned when they tried to predict the future.

The road from data to bias-free data to theory to prediction is long, winding and has many false branches. Scientists have devoted their lives to this. And many have been working round-the-clock on the COVID crisis to figure it out. The difficulty of prediction is why the models for COVID-19’s spread and ultimate toll have varied so much, especially in the early days when we knew a lot less than we know now. We knew that COVID-19 could infect millions. We knew it could kill hundreds of thousands. But just how many and just how fast depended a lot on your assumptions — about the disease itself and about society’s response.

And most epidemiologists have been up-front about this, stating their assumptions, their uncertainties and welcoming criticism. Models makers rarely have a single prediction but usually a variety of predictions for various possible courses. Figuring this stuff out is hard. It involves a lot of work, a great deal of knowledge, a familiarity with past epidemics, attention to your underlying assumptions and a significant amount of humility. But unless we plan on a return to reading sheep entrails, it is our only option.

Curve Fitting, For the Loss

The White House, however, has taken a different approach. From the beginning, they have favored models that predicted a much milder epidemic. But what they have now done, apparently, is throw over even these models in favor of putting the number of cases into a spreadsheet and applying a cubic fit from the pull-down menu. It’s a sophisticated version of connect-the-dots but where half the dots weren’t there so you just drew the lines where you think they should be. They are pretending that this constitutes a model.

But it doesn’t. A simple mathematical fit to data does not account for biases in said data. It tells you nothing about the underlying fundamentals. It reveals no underlying information about the disease or our society. If epidemiological models are a level, this is holding your thumb up, closing both eyes and saying, “Eh, looks fine to me.”

The simple truth is that you can rarely apply such simplistic analysis to real-life behavior. And you especially can not apply it to something that has already shown that it doesn’t behave that way. The infection curves we’ve seen in countries has been a sharp rise, a plateau and then a slow decline. We have not seen the sharp bell curve predicted above. This “cubic model” fit assumes that we have perfect information on the virus, which we do not have, and that the virus behaves in a way that it clearly does not.

What’s more, remember what I said about how models can crash and burn? Fitting a simple equation is a “model” that is particularly vulnerable to crashing and burning. The COVID epidemic is still in progress. We have not seen any fall in cases so far in the United States. As a result, that dramatic drop in case-loads is not based on any data at all. It is entirely theoretical, assumed because they used the equation to fit one side of the data. This is the equivalent of assuming that since one side of the Titanic doesn’t have any holes in, the other side must not either.

This comic from XKCD illustrates the problem of fitting mathematical formulae to data better than I ever could:

Fitting data with a variety of mathematical models. Source (XCKD)

Note especially that last panel. The crazier the model, the bigger the assumptions, the wilder the results you get when you extend it beyond your data sample. If you’ve got a mathematical model that explains, say, stock market prices from 1950 to 2010, don’t be surprised if it predicts that the Dow hit -24,000 in 2012. It is intrinsic to mathematics that you get crazy predictions when you go outside your data sample. Mark Twain probably expressed it best, when talking about how engineering projects had shortened the length of the Mississippi River over the years:

In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oolitic Silurian Period,’ just a million years ago next November, the Lower Mississippi River was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.”

Another illustration of the point:

According to predictions from my “cubic model” fitted on yesterday’s data, today’s temperature is approaching absolute zero and we are all dead, mercifully. pic.twitter.com/jfNgpa2wBH

— John Voorheis (@john_voorheis) May 5, 2020

And another:

Check out the new predictions of my Cubic Model of the stock market. Looks like the economy will recover in no time! https://t.co/GLMsHequIy pic.twitter.com/qwo8IpcflL

— Kareem El-Badry (@kjb_astro) May 6, 2020

And another:

Fit a cubic model to my total number of citations by year and I’m very excited to share I will have -850 citations by the year 2040 pic.twitter.com/0WEbgkfdQX

— Mark Hibbins (@mhibbinsbiology) May 6, 2020

I should note that this is not Hassett’s first goat rodeo with this sort of thing. He was co-author on a book that predicted the Dow would hit 36,000. And he used similar “fitting” to argue that tax cuts would pay for themselves:

It’s come to my attention the “economist” in charge of this ridiculous claim is the same one who made this pants on head level dataviz circa 2004. This in monstrous incompetence during a pandemic pic.twitter.com/7Xax48YZEw

— Sean – now with 10% more social distancing! (@SeanfromSeabeck) May 5, 2020

The White House isn’t the only group pulling this sort of flim-flammery on COVID-19 either. This website contains similar fits to the data from a variety of countries. But while some countries (the Netherlands, France, Germany) have followed that curve shape reasonably, others (Italy, the US) have not. Moreover, the models keep changing. Two weeks ago, they predicted the virus would be mostly gone by mid-May and entirely gone by Labor Day. Today’s analysis shows that extending out to June and October, respectively.

This touches on one of the subtle deceptions of the White House tweet. They show multiple IHME models that have changed as more information on the epidemic has come in. But they show only one cubic fit. This would give you the impression that even the IHME, as optimistic as they are, doesn’t know what they’re doing and the cubic fitters do. What is ignored is that if you had run their “cubic model” say, a month ago, it would have predicted that the epidemic would be over by now. Simplistic curve-fitting is extremely vulnerable to assuming that however many cases you have right now — be it two, two thousand or two million — is the peak. And that you’ll get back to zero as fast as you got there from zero.

In the end, the reason the White House is taking this “model” seriously is because it’s telling them what they want to hear, what even the IHME won’t tell them: that we can end the lockdowns and the virus will simply disappear within a couple of weeks. There is absolutely no reason to believe this. Social distancing and lockdowns have brought the infectiousness of the virus down to just about 1 so that the virus is leveling off. But they haven’t brought it below 1, so that the virus starts to fade¹ Models may come and go. But if history and this pandemic has taught us anything, it’s that “opening the economy” is not going to drop the infection rate.

To be fair, treating the United States as one giant block of data is silly. We are a far-flung nation with a broad range of responses to this. And the virus is doing different things in different places. The NYT has a breakdown by state that shows that, in places like New York, the virus peaked a few weeks ago and is now in decline while in places like Iowa, it’s still on the rise.

Tags: Coronavirus cubic model data epidemiology Lindy Stats science

Michael Siegel

Michael Siegel is an astronomer living in Pennsylvania. He blogs at his own site, and has written a novel.

Pinky says:

May 7, 2020 at 8:09 am

I’m confused. Doesn’t the original tweet say the IHME projections fit well? There are four projections on the graph, and the red one is the only one not labelled “IHME”. Are all the IHME projections cubic? If not, why are people making fun of the cubic curve?Report
- Michael Siegel in reply to Pinky says:
  
  May 7, 2020 at 9:06 am
  
  The cubic one is now the one they are favoring over the IHME (which is more complex and does not yet account for loosing restrictions). I’ve edited the post a bit to make this clearer.Report
  - Pinky in reply to Michael Siegel says:
    
    May 7, 2020 at 9:32 am
    
    OK, but it does contradict Tuesday’s post from the CEA, which says “As shown, IHME’s mortality curves have matched the data fairly well.” That post is featured in your article. Kareem El-Badry’s tweet is a reply to it, but that post specifically endorses the other three methods shown on the graph. Are you sure you’re reading this story right? The CEA sends out a post with four curves and mentions three of them, and now people are complaining about the fourth?
    
    And it isn’t like “the curve” is some radical, simplistic departure from the way we’ve been talking about the pandemic. We’ve been bending and flattening that bad boy all year. So why are people suddenly complaining about it?Report
    - Michael Siegel in reply to Pinky says:
      
      May 7, 2020 at 9:37 am
      
      I didn’t want to go too much in depth here, but the White house has been vigorously defending the IHME model, which predicts way fewer deaths than anyone else. The tweet was meant to defend that. But it also included the “cubic model fit”. And we found out that they are basing policy not even on the uber-optimistic model but on that cubic model fit. So the conversation is like so:
      
      “The IHME predicts way fewer deaths!”
      “Their models are too optimistic.”
      “Well here’s a graph showing it’s done a good job so far.”
      “Well, it really hasn’t but … wait, what’s that red line?”
      “Oh, that’s a cubic fit model. We think that’s even BETTER than the IHME.”
      “WHAT?!”
      
      While a simplistic curve is useful for demonstrating the concept of “bending the curve”, you want to use more sophisticated methods for policy. The difference between Hasset’s “model” and the IHME is about 60,000 deaths. The difference between basing policy on Hassett’s model and anything else could be hundreds of thousands of deaths and millions of people getting very very sick.Report
      - Pinky in reply to Michael Siegel says:
        
        May 7, 2020 at 9:46 am
        
        Thanks for the clarification.Report
        
        Michael Siegel in reply to Pinky says:
        
        May 7, 2020 at 9:47 am
        
        Thanks for making me clarify! I knew this post might be a bit confusing so good to get some early feedback.Report
      - George Turner in reply to Michael Siegel says:
        
        May 7, 2020 at 10:25 am
        
        The White House tweet said
        
        To better visualize observed data, we also continually update a curve-fitting exercise to summarize COVID-19’s observed trajectory. Particularly with irregular data, curve fitting can improve data visualization. As shown, IHME’s mortality curves have matched the data fairly well.
        
        They just call it a “curve-fitting exercise” to summarize, and refer to it as helping with data visualization. There’s no indication they threw in a cubic fit for anything other than comparison to the real data.
        
        I suspect this outrage might be like CNN’s article yesterday that talked about the sound of a flushing toilet during oral arguments before the Supreme Court.Report
        
        Michael Cain in reply to George Turner says:
        
        May 7, 2020 at 11:36 am
        
        They just call it a “curve-fitting exercise” to summarize, and refer to it as helping with data visualization. There’s no indication they threw in a cubic fit for anything other than comparison to the real data.
        
        My complaint is that we know the underlying process is a sigmoid, or a sum of sigmoids. Most cumulative social processes look like a sigmoid — epidemics, consumer adoption of products, etc. Given that, I complain when people say, “Oh, we’ll use a cubic model here.” If they’re going to do curve-fitting, all least use a curve that matches the underlying process and extrapolates in a sane fashion, rather than a cubic that’s going to do insane things as soon as you move outside the fitted region.Report
        
        Mike Schilling in reply to Michael Cain says:
        
        May 8, 2020 at 4:04 am
        
        The cubic fit is like a sigmoid colon.Report
        
        Philip H in reply to George Turner says:
        
        May 7, 2020 at 11:42 am
        
        Except its not. Not by a long shot.
        
        One of the things that really rankles us scientists is when our data are presented without the assumptions that underly them and the uncertainty that comes with them. One of the many problems here is that while the IHME’s curves are a reasonable fit visually, they aren’t as good as other models statistically, and the cubit model is even worse. This means the IHME and cubic models probably use less refined starting (or initializing conditions) and less precise assumptions then the UW model (as one example). That means that if you look at actual dosts from the model making up the curve, you see they are “noisier” and thus less precise.
        
        Why does precision matter? Because policies made on imprecise data mean policies aimed at the wrong goal, which in turn means policies that are less effective, and more wasteful of things like time and money. And in this case more wasteful of people’s lives.Report
        
        George Turner in reply to Philip H says:
        
        May 7, 2020 at 12:23 pm
        
        Well, compare that to do the academic’s model in the UK.
        
        Code review of Fergoson’s model
        
        All I can say is “Yikes.”
        
        Why would they name a big structure P when P is obviously a name for a pointer? Any competent programmer would have called it “my_struct” and pass it to “do_function((void *)). ^_^
        
        There are more fundamental problems in trying to model the outbreak, because one people and governments started responding, the math became less important than the messaging, as when you’re trying to predict how many people will take the ice bucket challenge next month.
        
        Or, taking a page from sophisticated aerodynamic simulation and modeling, where you use well understood data and formulas for thrust, drag, lift, and weight to predict a vehicle’s longer-term future position. Sometimes the most important thing to know is the pilot’s destination. Is he going to San Diego or New York? If you don’t know the answer to that question, all the fancy math is irrelevant.Report
        
        Oscar Gordon in reply to George Turner says:
        
        May 7, 2020 at 6:31 pm
        
        BTW Everyone should read that code review.Report
        
        Michael Cain in reply to Oscar Gordon says:
        
        May 7, 2020 at 8:39 pm
        
        I’ll certainly concede a number of bad practices by the academics. OTOH, some of the criticism is unfair. If you replace a module doing some approximation with another one that implements a different approximation, hardcore regression testing goes out the window. Intermediate values are different, so the same inputs produce different final results. There are lots of things that should be the same — data passing through unchanged code should still produce the same results — but those test points may not be readily accessible.
        
        If the insurance companies were doing this, we’d still be waiting while they negotiated the specs and price. At least based on my experience, the first versions of new models are always built on the cheap. Academics are cheap but suffer from all sorts of problems — oddly trained ~~minions~~ graduate students for labor, outside time pressures on that labor, frequent turnover.
        
        It’s been that way for a very long time. I spent an afternoon at one of the Smithsonians once reading the letters that went back and forth between the professors working on discovering the neutron’s properties, chain reactions, and ultimately whether the bomb would work. I loved one of them that read roughly, “Your idea of the 5th is very interesting. However, Bob graduated and is not available to work on it. I expect a new graduate student at the beginning of the term in a couple of months, and will throw him at that problem to see if he sinks or swims.”Report
        
        Oscar Gordon in reply to Michael Cain says:
        
        May 7, 2020 at 9:49 pm
        
        One of the code bases I am currently responsible for was written by a former academic whose approach to software development was as slipshod as the review suggests, so my sympathy for such practices is more than a little strained.
        
        That said, 2 points:
        
        1) I don’t expect academics to write clean code. They aren’t trained in it, and they rarely have the time nor the budget to do it right. They hammer out the models, and when it’s mature enough, it gets kicked out into the wild, where other people distill the core algorithms and turn it into production code.
        
        2) Knowing this, it is irresponsible to be basing any kind of government policy on such a code base. I know it happens, in big ways and small, every day across the world, but that does not make it a good idea.
        
        I mean, the whole time I was at the Lazy B*, every single modeling code had to be put through a rather rigorous test regime to validate it against known data (read: static, dynamic, wind tunnel, and flight test), and even then, all it did was reduce the number of physical tests you had to do.
        
        One of my coworkers has been building and maintaining a tool for the Nuke Power industry that helps Nuke engineers setup reactor heat transfer simulations, and the rigor just that tool undergoes, much less our main CFD solver, is pretty tight.
        
        So while I’m happy to give academics a pass on bad coding practices, I’m not going to extend the same when they shop their models around to government agencies (or media outlets) in order to influence policy. If your code has the potential to save or end lives, it better be fecking tight.
        
        *And don’t get me spun up again on MCAS.Report
        
        Oscar Gordon in reply to Philip H says:
        
        May 7, 2020 at 5:51 pm
        
        One of the things that really rankles us scientists is when our data are presented without the assumptions that underly them and the uncertainty that comes with them.
        
        Preach it, brother!Report
fillyjonk says:

May 7, 2020 at 11:37 am

Figures don’t lie, but liars figure.

Or, another old saying – which I only remember now as the back-translation-into-English a Russian fellow grad student of mine gave one day: “There are three kinds of lies. The lie, the intentional lie, and the statistic.” (I think the original – possibly Twainian – English had “damn lie” for the second category? But I like Sasha’s version of it better.)

Ironically, about three weeks ago, I covered polynomial regression in my (now entirely online) advanced biostats class and one of the big points I made was that overfitting curves, or exuberantly fitting without considering the realities of the situation, would steer you wrong every time. But of course, I’m just a nobody at a possibly-doomed small public university….Report
- Philip H in reply to fillyjonk says:
  
  May 7, 2020 at 11:46 am
  
  Its all about assumptions and the data used for model initialization.Report
  - fillyjonk in reply to Philip H says:
    
    May 7, 2020 at 12:09 pm
    
    Yeah, and that’s the problem: with computer stats packages you have this flashy shiny tool where you don’t have to consider “what does this actually mean” or “what are the underlying assumptions” and you get bad models that suck people in because a “Real Scientist” made them.
    
    I’ve fallen prey to it a couple times in this, because I want Answers and I am just a stupid ecologist who doesn’t know this stuff. I’m fast learning that some of the people getting big bucks to know this stuff don’t know much more stats than *I* do and that’s kind of scary.Report
  - Michael Cain in reply to Philip H says:
    
    May 7, 2020 at 12:33 pm
    
    There’s model correctness as well, particularly for extrapolating (forecasting). The IHME model has the benefit of using a sane underlying model — eg, you can extrapolate indefinitely into the future and it won’t ever forecast more than 100% of the population will get sick or die. If the WH wants to do curve-fitting exercises, there’s no reason they can’t fit functions that have the same kind of sanity properties as the IHME — except that you can’t do it on Excel.Report
    - George Turner in reply to Michael Cain says:
      
      May 7, 2020 at 12:57 pm
      
      Excel works fine for the proper epidemic logistic curve simulations. I wrote one months ago that made day-by-day predictions of new cases, total cases, deaths, etc., though I haven’t bothered with keeping it updated.
      
      The trouble with the exercise is that R0 is wildly variable based on people’s behavior and prevention measures, and some of those are effective enough to drive the case numbers toward zero (such is an New Zealand). The logistic curve’s input is the final expected number of cases when the infection stops (herd immunity), which is essentially a guess based on R0.
      
      So you face the nagging question of whether it’s better to have a really good model of a bad guess or a really bad model of a good guess.Report
- InMD in reply to fillyjonk says:
  
  May 7, 2020 at 11:56 am
  
  Figures don’t lie, but liars figure.
  
  That’s a great saying I’d never heard before and I’m definitely stealing it.Report
- Michael Cain in reply to fillyjonk says:
  
  May 7, 2020 at 12:24 pm
  
  Twain popularized “lies, damned lies, and statistics.” Twain attributed it to Disraeli, but it doesn’t appear in any of Disraeli’s known works. The phrase did appear in print sometime in 1891, but the author of that particular piece was anonymous.
  
  One of the most fun (and most difficult) math classes I took as an undergrad was approximation theory. At that time, before mammoth amounts of computer storage were available to support table look-up, it was common to have approximations to difficult-but-useful functions that had guaranteed maximum errors over particular intervals. An easy-to-evaluate rational polynomial expression might match some other function to 10 or 12 digits on a specific interval — and would blow up as soon as you moved outside that interval. It made me very cautious, perhaps overly so, about extrapolating outside the range of the data with fitted curves.Report
Kristin Devine says:

May 7, 2020 at 11:44 am

Great piece! Thanks!!Report
Eduard de Jong says:

May 7, 2020 at 11:47 am

Assuming that presenting the cubic curve as fitting the data is not done out of malice, it is clear that it’s very hard for most non-mathematicians to grasp the exponential curve.
One of the fundamental mathematical properties of an exponential curve is that for any piece of it there always will a quadratic, cubic or higher power polynomial curve that fits that segment very closely only to deviate form it before and after the segment where it matches.

The basic model for an epidemic is a know physical process of exponential growth, that is a process where the amount of growth depends on the size of the population. In nature there are a number of such processes. As the environment in which growth happens is n practice always finite, iny exponential growth is always followed by a decline. The decline is usually also exponential. Typically growth stops at a certain level of saturation.
Understanding the basic mechanism behind an observed process is the basis for interpreting the date that an be measured from the process. Those measurements will then reveal the parameters of the process. For an exponential process those parameters are the growth rate and a starting value.

For an unknown process one could try several different curves to see which fits best, and exponential should always be one of the curves tried. That was what I did thought when studying physics in the 70’s: first plot the data on logarithmic paper, and if the line looks somewhat straight, the process you have observed is exponential, so don’t bother trying any other curve!

The nice thing about exponential growth that it has one critical parameter: growth rate. All measurable effects are proportional to that parameter: For Covid-19 the number of cases, the number deaths, the number of undiagnosed cases are all proportional to the growth rate. While it is hard to know these proportions exactly, we can chose the measurement we can measure the most accurate to learn its single parameter. In case of Covid the number of death may be the most accurately known.Report
- Brandon Berg in reply to Eduard de Jong says:
  
  May 7, 2020 at 12:14 pm
  
  Note, however, that exponential growth is a poor model for COVID-19, because the growth rate changes in response to precautions taken to limit the spread. In the US, after initial exponential growth, the growth in cumulative cases has been virtually linear for a month now (suggesting a steadily falling exponential growth rate), and the growth rate in cumulative deaths has been linear for three weeks.Report
  - Michael Cain in reply to Brandon Berg says:
    
    May 7, 2020 at 1:05 pm
    
    In a given population, an exponential model of Covid-19 growth is fine for an early part of the cycle. Precautions affect the exponential coefficient (eg, lengthen the doubling period), but the model is still reasonable. A linear model of Covid-19 is fine for a middle part of the cycle. A log-looking model of Covid-19 is fine for the later part of the cycle. A generalized logistics curve looks exactly like that. Logistics curves were developed in order to more accurately describe the evolution of some process in a population.
    
    My question is why there are so many otherwise smart people ignoring the fact that we know the process will follow a logistics (or similar) curve and insist on fitting to something else. One thought, based on my experience the last time I was in graduate school, is that Excel (and in particular, Excel on Windows) has become the standard numerical platform for a lot of academics. And Excel doesn’t support fitting to less common models.Report
    - George Turner in reply to Michael Cain says:
      
      May 7, 2020 at 1:44 pm
      
      Well, you probably have a point that Excel doesn’t have a built-in logistics curve, so you have to build it all yourself.
      
      My original logistics spreadsheet, which was tuned to data from late February through the mid-March, was pretty dead on till the end of March. For March 29th it predicted 135,288 cases and we had 144,980. Then we flattened the curve and it went out the window. By April 4th my model overpredicted by a factor of 3. By April 8th it overpredicted by a factor of 10. By April 11th it overpredicted by a factor of 20.
      
      My original curve had 270 million total cases and 4 million deaths by this point (based on an assumed 1.5% death rate).
      
      The model was great. The output was garbage because we changed the world it modeled.Report
      - Michael Cain in reply to George Turner says:
        
        May 7, 2020 at 3:33 pm
        
        So the underlying process is non-stationary. (We certainly hoped it was non-stationary, given initial outlooks.) Now we’re into a space where we talk about how the parameters are evolving. Which gets us back to the situation where one side (the IHME for convenience) says that the process is non-stationary, stochastic, and they’re trying to forecast how various actions will affect the evolution of the model. The other side says “a cubic whose only variable is time fitted to the data makes a nice forecast about the new cases going to zero quickly.” If you ask them what the coefficients mean, there’s no answer. Because the coefficients have no relationship to the underlying process. It’s a curve-fitting exercise, with an implication that if a quadratic looks better when they do next week’s forecast, they’ll use the quadratic. That’s what I’m complaining about. If we’re just curve-fitting, let’s at least pick curves that have some relationship to the underlying process.
        
        Maybe I’m just frustrated. I’m working on a computer vision project and and neural networks irritate me. Thousands of coefficients to set, a reduced gradient algorithm to optimize the fit to a big pile of data, knowing that there are lots of local optima, and no one has a clue about what it means that this particular coefficient gets set to a value of 2.6. I am really hoping that this project stays simple enough that I can avoid going the NN route.Report
        
        veronica d in reply to Michael Cain says:
        
        May 7, 2020 at 5:10 pm
        
        Yeah, I’m convinced NNs were invented to make mathematicians cry.
        
        One time I was talking to one of the engineers on {big NN framework}. I asked him how they do their optimization. He’s like, “Well, nothing really works well. We just do gradient descent with random restarts. It’s hit or miss. If you get a crappy result, we’ll just run it again.”
        
        It made me sad.
        
        That said, when well-tuned, NNs perform really well.
        
        So what does “well-tuned” mean in this case? How can I figure out how to tune it?
        
        “Well about that. See we can optimize over the tuning parameters using gaussian processes and…”
        
        Blarg!!!
        
        (Although gaussian processes are really cool.)
        
        Anyway, interpretability — what a quaint idea 🙂Report
        
        Michael Cain in reply to veronica d says:
        
        May 7, 2020 at 6:07 pm
        
        Yeah, I’m convinced NNs were invented to make mathematicians cry.
        
        We should write a post.
        
        A few weeks ago I downloaded one of the available demonstration NNs. The coefficients ran to something over 250MB compressed. The claim was that it was trained on some unimaginable number of images, collected and labeled by an army of ~~minions~~ graduate students, and that the training run that they were happy with took days on a setup with tens of thousands of GPU cores. I pulled a random cat picture out of my collection — part of my project requires that I recognize cats in a back yard setting — and fed it to the NN. It returned this:
        
        http://www.mcain6925.com/ordinary/cat02.jpgReport
        
        veronica d in reply to Michael Cain says:
        
        May 7, 2020 at 6:11 pm
        
        Holy cow!
        
        😛Report
        
        Oscar Gordon in reply to veronica d says:
        
        May 7, 2020 at 6:29 pm
        
        ISWYDT!Report
        
        Chip Daniels in reply to Michael Cain says:
        
        May 7, 2020 at 6:59 pm
        
        Look what it did with poor Rob Ross:
        https://www.youtube.com/watch?v=5DaVnriHhPcReport
        
        Marchmaine in reply to Michael Cain says:
        
        May 7, 2020 at 7:03 pm
        
        If I’ve learned anything about models and statistics and elections since 2016, I’d say that’s the FiveThirtyEight confidence model of a Cow.Report
greginak says:

May 7, 2020 at 11:51 am

I”ve long said HS kids should have at least half a year of stats even if that meant ditching some other math. Now I’m up to a year of stats with a least a quarter spent on models, there uses, misuses and challenges.Report
- fillyjonk in reply to greginak says:
  
  May 7, 2020 at 12:12 pm
  
  Stats and probability. You can teach these without calculus. I do. I am crap at calculus but not too bad at basic stats. (Not as good at probability because I never had coursework just in that, but I have a couple books that I hope will partially fix that when I read them this summer).
  
  Knowing stats and how models work would go a long way to people not getting so sucked in by some of the charlatans out there – including some of the economic/goldbug/whatever-investment people.Report
  - greginak in reply to fillyjonk says:
    
    May 7, 2020 at 12:16 pm
    
    Same. I was barely mediocre at Calculus years ago but could do grad level stats/prob. Stats/prob/models are part of our everyday life now and we really need to know them.Report
Philip H says:

May 7, 2020 at 11:54 am

I find it frustratingly ironic that an Administration and Political Party who dismiss valid climate modeling because of statistical best fit exercises now decides pandemic response policy based on cherry picked statistical curve fitting exercises. in the grand scheme of things its probably a medium level hypocrisy but its stuff like this that makes Republican moralizing so untenable.Report
- CJColucci in reply to Philip H says:
  
  May 7, 2020 at 12:33 pm
  
  “I’m shocked — SHOCKED! — that there is gambling going on here!”
  “Your winnings, Captain.”Report
- JS in reply to Philip H says:
  
  May 7, 2020 at 5:31 pm
  
  “exercises now decides pandemic response policy based on cherry picked statistical curve fitting exercises”
  
  That is a very, very generous explanation of “Let’s toss these data points into Excel and ask it to fit them”.
  
  “I make pretty picture I don’t understand” is more accurate. No actual math, science, or modelling was done. Someone hit a button in Excel that they don’t understand, and assumed the magic of computers generated a model and not a simple function.Report
Chip Daniels says:

May 7, 2020 at 1:23 pm

We have long ago passed the point of presumption of good faith bumbling.

This Administration and its party has a Soviet level of contempt even for the concept of expertise. All organs of the state, all institutions of the nation, are to be bent to the service of the ruling class and all their statements and arguments are merely word screens to that end.

We shouldn’t take anything they say seriously, since they themselves don’t.Report
Chip Daniels says:

May 7, 2020 at 4:06 pm

Related from Steve M over at No More Mister Nice Blog:
:
DISPATCHES FROM THE AMERICAN IDIOCRACY
https://nomoremister.blogspot.com/2020/05/dispatches-from-american-idiocracy.html

He notes that the current number one best seller at Amazon is a book of anti-vaxxer woo describing the current pandemic as a vast conspiracy by a corrupt cabal of government and big tech, with excited blurbs from conservative amplifiers Michelle Malkin and Ben Garrison.

Do Malkin and Garrison really believe this? Or is it a grift where they sold their name for a piece of the action?
Who knows, and what does it matter in the end?

Because the modern conservative movement is like an evangelical huckster organization, where belief and lies are just tools to be deployed in whatever fashion is needed at the moment to fill the collection plate.

When it is convenient to downplay the pandemic, they shrug off as merely the flu; When it becomes important to present the pandemic as an apocalypse, they fan the flames of fear and paranoia.

But always, always, the eye is on the grift, the fleecing of the rubes by whatever means. Whether it is to stampede the rubes into voting for tax cuts for billionaires, or getting them to agree to go back to work and drop their unemployment claims, or maybe to buy some miracle elixir hawked by a radio jock, the transfer of money upwards is the overriding goal.Report
- Pinky in reply to Chip Daniels says:
  
  May 7, 2020 at 4:34 pm
  
  I hope the book isn’t what you describe, but I also think it isn’t. I mean, I don’t want to see an anti-vaxxer gain any credibility, but looking around online and checking the book’s table of contents, I don’t think it’s about covid-19 at all.Report
  - Chip Daniels in reply to Pinky says:
    
    May 7, 2020 at 5:13 pm
    
    The author is a certified loon:
    https://www.syracuse.com/coronavirus/2020/05/youtube-removes-plandemic-video-with-coronavirus-claims-by-dr-judy-mikovits.html
    “The Plandemic,” a 25-minute clip from an upcoming documentary, was taken off of YouTube this week for violating the Google-owned video site’s community guidelines. The video centered on Dr. Judy Mikovits, a former chronic fatigue researcher who claims the federal government is behind a “plague of corruption” to inflate profits from a potential vaccine even as COVID-19 threatens lives.
    
    The bigger point here is the nexus with the conservative faction which controls over half of our political system.
    
    It isn’t possible to speak of the conservative movement in terms of ideas and theory or conventional political ideology.
    Instead, it is just a loose confederation of grifters, charlatans, google eyed true believers and cynical special interests.
    
    That’s why I become so hostile to the idea of taking the President’s Tweets or the actions of Mitch McConnell and discussing them in cool cerebral terms as if they represented coherent thoughts.
    
    Its a bit like hearing a pronouncement of Lysenko and then discussing the pros and cons of socialist theory on corn seedlings. The entire concept is mad.Report
    - veronica d in reply to Chip Daniels says:
      
      May 7, 2020 at 5:18 pm
      
      It isn’t possible to speak of the conservative movement in terms of ideas and theory or conventional political ideology.
      Instead, it is just a loose confederation of grifters, charlatans, google eyed true believers and cynical special interests.
      
      I wonder what Jacob Wohl has been up to lately? (she asks ironically)Report
      - JS in reply to veronica d says:
        
        May 7, 2020 at 5:35 pm
        
        If that wasn’t sarcasm, he’s caught up in yet another false sexual harassment claim — he hired someone to accuse Fauci of it and was in the process of hiring a second, when the first one apparently just up and confessed complete with recordings of her conversation with Wohl.
        
        How he has not actually been arrested yet is beyond me, or at least sued.Report
        
        veronica d in reply to JS says:
        
        May 7, 2020 at 5:41 pm
        
        It was definitely sarcasm.Report
        
        JS in reply to veronica d says:
        
        May 8, 2020 at 11:25 am
        
        It’s super hard to keep up with the insane news these days.
        
        This is the dumdest timeline.
        
        What’s the joke? This is the season where the directors and writers have just run out of ideas and they’re just throwing random crap at the wall? “What if Donald Trump was President? What if there was a plague? What if there was this one 20 year old guy, with backwards sunglasses, that kept trying to be a player and paying his GF to make felony accusations against people?”
        
        At this point, I’m expecting Ivanka Trump to announce her previously unknown cousin Willy is moving into the WH, that lovable scamp.Report
        
        Oscar Gordon in reply to JS says:
        
        May 8, 2020 at 11:44 am
        
        What if there was this one 20 year old guy, with backwards sunglasses, that kept trying to be a player and paying his GF to make felony accusations against people?”
        
        Wait? What?!Report
        
        Michael Siegel in reply to Oscar Gordon says:
        
        May 8, 2020 at 11:55 am
        
        I think he’s referring to Jacob Wohl, who did another of his “accuse someone famous of sexual misconduct things”, this time targeting Fauci.Report
    - Pinky in reply to Chip Daniels says:
      
      May 7, 2020 at 5:35 pm
      
      I think we’d both agree that a political movement is lost when its followers stop caring whether what they’re saying is true. But it took me about 10 minutes to find out that what you posted isn’t true, or at least to find good reason to believe that it isn’t true, and when I questioned you about it, you said that it doesn’t affect the bigger point. What am I supposed to do with that? I mean, if you’re right, you’re right, but I don’t sense that you care whether you’re right.
      
      You mention Malkin’s and Garrison’s blurbs, but you omit that Robert Kennedy Jr. wrote the forward. Anti-vaxxers show up on the left and the right, but you use this as an opportunity to denounce Mitch McConnell? I don’t think you’re doing it in bad faith, but I think you’ve badly lost perspective.Report
      - Chip Daniels in reply to Pinky says:
        
        May 7, 2020 at 5:43 pm
        
        First of all, we are both getting the contents of the books second hand, so bad on us.
        And yes, RFK Jr. is also a loon who should be regarded with scorn.
        
        Which leaves us with my point is that the conservative movement has no core beliefs, but is just a collection of grifters etc.
        
        Can you find grifters on the left? Absolutely.
        
        Do you think the two factions, liberal and conservative, are equally bereft of ideas and equally infested with grifters and charlatans?Report
        
        Pinky in reply to Chip Daniels says:
        
        May 7, 2020 at 6:05 pm
        
        Equally? Not even close, and not in your favor.
        
        ETA: I’m sure I’m biased because I recognize that there are legitimate ideas on the right, even though they may not be getting air time these days. I don’t see any on the left.Report
        
        Chip Daniels in reply to Pinky says:
        
        May 7, 2020 at 10:39 pm
        
        And now the bat-signal of lunacy is being amplified throughout the conservative blogosphere:
        
        https://www.thegatewaypundit.com/2020/05/youtube-deletes-video-plandemic-dr-mikovits-accusing-dr-fauci-corruption-suppression-not-approved-thought-police/
        
        I give it 50/50 odds at least one of our OT commenters will start reciting the talking points soon.Report
        
        Pinky in reply to Chip Daniels says:
        
        May 8, 2020 at 8:26 am
        
        I’ve been seeing comments that “They” are lying about the numbers equally commonly on the left and right. I’m not talking about undercounting or overcounting debates, but the accusation – no, the assumption that differences between reality and statistics are deliberate.Report
        
        Mike Schilling in reply to Pinky says:
        
        May 8, 2020 at 2:13 pm
        
        Equally by Twitter commenters on the left and elected officials on the right.Report
        
        Philip H in reply to Pinky says:
        
        May 8, 2020 at 2:46 pm
        
        @Pinky – with reporting like this why wouldn’t some of use be suspicious of the reporting in Republican controlled states:
        
        https://www.cnn.com/2020/04/29/politics/florida-coronavirus-death-figures-withheld/index.htmlReport
        
        George Turner in reply to Philip H says:
        
        May 8, 2020 at 3:38 pm
        
        Someone raised privacy concerns and other reporters noted that the Health Department counts didn’t match the medical examiner counts, so the bureaucracy was set in motion, doing what bureaucracies do.
        
        It’s far worse in Democrat states were they can’t release any information that might contradict the idea that all medical problems are caused by European colonialism, manspreading, and toxic fandoms.Report
        
        Kazzy in reply to Pinky says:
        
        May 8, 2020 at 4:41 pm
        
        I watched the Plandemic idiocy. One of the big secrets about Covid — the liberal counting of deaths — was so secret that they had to dig up footbage of Dr. Bryx discussing it in a press conference and substantiate that with memos sent to medical facilities across the country.
        
        WHY ARE THEY LYING?!?! [eye roll]
        
        There is room to take issue with the way data is being collected and reported. There is nothing to suggest a vast conspiracy . Quite the opposite.
        
        Our current leaders couldn’t rig a BINGO game… and that is assuming they have the wherewithal to work together on, well, anything.Report