On the Way to Fraud [UPDATED]

Chris

Chris lives in Austin, TX, where he once shook Willie Nelson's hand.

Related Post Roulette

55 Responses

  1. greginak says:

    Interesting story. When the research first came out i read the first few lines and then dismissed it as wildly improbable and there was clearly some over promoting or a mistake somewhere. So i can’t say i’m surprised that it actually was bs. The journals have been to trusting for to long of papers they see. Sad but true. Even with the long times it can take to get a paper published and all that.Report

    • Chris in reply to greginak says:

      I remember thinking it was implausible as well, and hadn’t thought about it since. But I just figured it was a research issue, not fraud. I don’t think fraud would ever have crossed my mind when reading it. And in a sense, it shouldn’t, but at the very least readers and other researchers and journalists should have, collectively, applied the brakes with respect to the findings.Report

      • Glyph in reply to Chris says:

        First of all, awesome write-up. Seriously.

        Secondly, what is interesting to me here is, for lack of a better word, “BS detection”, and the ways in which it can be defeated (more specifically: why human BS Detectors are circumvented in some scenarios for some people, but not for others; and the next scenario might flip that around, with a completely-different set of people uncritically accepting some BS, and others calling it out.)

        I suspect that “biases” and “incentives” are pretty much ALWAYS the answer; in which case, since we know biases/incentives cannot be eliminated, then exposing any narrative or issue to multiple biases/incentives is always best.

        To tie it into Dan’s Fox piece and my comments there, if Fox were actually good at its declared job, then Fox would be an unambiguously-good thing, regardless if you agree with its underlying biases or not.Report

        • Chris in reply to Glyph says:

          Thank you. And you’re right about the bias and incentives. I think it was Will who said (on Twitter) that motivated reasoning is really, really powerful. It is, overwhelmingly so. And we’re all subject to its power.Report

  2. Jaybird says:

    Appeal to the prejudices of supporters + Appeal to the prejudices of opponents == Too good to check.

    The three laws are perfect and, ideally, would prevent stuff like this from happening… but… well, it seems like the only real solution is to run LaCour out of town on a rail and make his name be used in the same breath as Stephen Glass and Jayson Blair and point to LaCour on the first day of any given senior-level or grad school research class until we get another example that we can use.Report

    • Chris in reply to Jaybird says:

      Oh, he will undoubtedly be run out of town like every caught fraudster before him (I am thinking, specifically, of one whom I respected a great deal prior to his fraud being discovered: Marc Hauser). His name will be widely known. His tale will be a cautionary one. Fraud will still happen, because research takes forever, and when you realize it’s not working, you will have wasted that time, and effort, and you will be behind. For grad students and untenured faculty, for whom publications are everything (in cog psy, 5-7 in grad school are necessary for tenure-track jobs, and 3-5 a year as junior faculty are necessary for tenure), fraud’s going to happen because people panic.

      The only real way to reduce the amount (which, I like to think, is pretty rare as it is) is to change the incentives: make publication numbers less of an issue, and quality research more of one. Then no one cares that you have a Science pub, or that you have 5 pubs in top journals, but only that you’ve demonstrated an ability to do good research.Report

      • Jaybird in reply to Chris says:

        Every “no significant results to report” paper looks the same, though…Report

        • Kim in reply to Jaybird says:

          Which is why you always have something to report, even if it’s just bolstering already known data!!
          (Seriously, smart researchers know how to twist one study into 10 findings, and only report on the most interesting of the ten — so if five fail in an entirely probable fashion, you report on the seventh.).Report

          • Chris in reply to Kim says:

            Seriously, smart researchers know how to twist one study into 10 findings, and only report on the most interesting of the ten — so if five fail in an entirely probable fashion, you report on the seventh.

            This does happen, and it is an absolutely horrible practice. With every comparison you make, the probability of finding a statistically significant result goes up. This is not the way to do good science.Report

            • Kim in reply to Chris says:

              Depends… If you’re looking at an entirely predictable result (that 30 years of research will back up), you’re probably not seeing something spurious.

              (Besides, you know as well as I do that I’m not really talking cherry-picking Anything Significant).Report

  3. Richard Hershberger says:

    You touched on an important point: publishing this in a high-profile publication is what brought them down.

    My brother was an academic chemist. He once in some fifth-tier journal a paper that was relevant to his research. So he attempted to replicate it, and was unable to. The next step is to contact the paper’s author to discuss the techniques used in more detail. The initial contact was easy. He simply got the guy on the phone. Once they got past the preliminaries, and my brother had explained why he was calling, the guy became evasive and nervous. That conversation ended inconclusively, and my brother was never able to reach him again by any medium. Eventually it dawned on my brother that the paper falsified the results. He didn’t pursue it further. Actually proving this would have been at best a lot of work, and really, why bother?

    The thing about fifth-tier journals is (1) there are a lot of them, and (2) they largely don’t matter. They exist to give an outlet for academics to meet their publish-or-perish quotas, not to publish interesting and significant results leading to further research. Those papers are published higher up in the food chain. This isn’t to say that everything in the fifth-tier journals is fraudulent, but some substantial portion of them are. In my brother’s case it was just dumb luck that a paper in one of these journals caught his attention.

    I don’t think anyone starts down this road intending to fake results, but the sunk costs bring them around. The smart ones at least know not to make their fake results all that interesting.Report

    • “The smart ones at least know not to make their fake results all that interesting.”

      Yeah, I said this yesterday. If you don’t want to get caught, keep things boring. It’s always ambition that brings you down!Report

      • Chris in reply to Glyph says:

        I suspect he didn’t fully understand how big the findings would be. I mean, I figure he knew they’d be seen as important in his field, but as I say in the post, I’d bet a lot of money that the one who chose Science as the first journal to submit it to (and it must have been the first) was Green, who plausibly claims to have had no knowledge of the fraud. I suspect that, when Green made that suggestion, LaCour felt the ground falling out from beneath his feet, but at that point, what’s he going to do? His choices are then between an absolutely certain career-ending revelation to Green, his adviser, his department, and his university, or an almost completely certain career-ending revelation to the whole world. In the moment, that “almost” must look like the only possible route to take.Report

        • Chris in reply to Chris says:

          I will add, though, that if there’s an inexplicable aspect to the nature of his fabrication, it’s in the size of the effect. Seriously, why the hell would you make it that big? and at 9 months, no less! He’d have been published if the effect were still showing up at 3 months, and been half (or a quarter) as big. He definitely got greedy, but again, I bet he had no idea it’d be Science material. And if he did, he’s an either an idiot or a sociopath.Report

          • Vikram Bath in reply to Chris says:

            I’d offer as an explanation that he might not have realized that effect sizes matter. He might have generated the data using some process and seen that the results were statistically significant and thought that was enough. A lot of academics don’t pay that much attention to effect sizes. I’d be unsurprised if a grad student missed their importance.Report

  4. Kim says:

    Yawn. There will always be people trying to fake data.
    Other than sullying (briefly) the quality of academic
    research, they’re mostly irrelevant.

    Where better procedures are needed is in preventing
    “unethical research” from being used as seed corn
    in order to make flamboyant (perfectly true!) articles
    [In fact, at least one of them has been cited on this site].

    It’s always easier to prove something if you already know what you’re looking for, after all.Report

  5. Vikram Bath says:

    Great writeup. Thank you.

    We should have waited for replications before we got excited about the implications of the findings, and we certainly shouldn’t have used the results to guide policy or practice.

    This, I’m not so sure about. You’d need to ask what we were using previously. If the study results were used to replace hunches, then I think that was probably a sound decision at the time even without replication.Report

    • Chris in reply to Vikram Bath says:

      Thank you.

      I suppose that’s true, in that just about any empirically-based idea is better than a gut-based hunch,but I doubt it’s ever that clear cut. There will usually be at least some research, and unless we’re talking about something completely new, there will be experience-based practices.

      But I realize I’m extremely conservative when it comes to science and its application: I want to see replications of the replications of the replications before I take it very seriously. I can get excited about early results, but I ain’t talking to policy-makers about it, and I ain’t investing money in programs and training based on it.Report

      • Vikram Bath in reply to Chris says:

        I think some of this reflects our different disciplines. When you get something wrong, you might kill someone. If I get something wrong…well, I might not never know. And it’s relatively frequent that companies succeed by executing well on the wrong thing rather than waiting for the right thing. Further, acting on mistaken information and observing the results might be the best way to discover what the correct information is.

        By I could understand why you might not want to take that sort of approach.Report

        • Chris in reply to Vikram Bath says:

          Ah, I think you’ve just awakened me to a distinction I was missing, which is likely important. Applied research (of which I’ve never done any) has to deal with these issues much more directly, with more complications. My experience is, in a sense, not as applicable as I’d like to think.Report

  6. j r says:

    Great write up!

    Here’s my question: would it have been so bad to simply turn in the paper that found that contact has no statistically significant effect on changing minds? That paper doesn’t get you in Science and it doesn’t get you a tenure track position at Princeton before you graduate, but it gets you a PhD and a shot at a decent career.

    Was it that his CV was so otherwise unimpressive that he felt that he had no choice but to go big?Report

  7. James K says:

    Great post Chris, though I do have a largely unrelated methodological question:

    A difference of 0.8 on a five-point scale . . . wow! You rarely see this sort of thing. Just do the math. On a 1-5 scale, the maximum theoretically possible change would be 4. But, considering that lots of people are already at “4” or “5” on the scale, it’s hard to imagine an average change of more than 2.

    I may be reading that wrong, but it sounds to me like a linear specification is being fitted to the response on the 5-point scale. Is that normal, because I wouldn’t want to fit a linear model to something that is bounded at both ends.Report

    • Chris in reply to James K says:

      Thank you.

      I’m not sure I understand your question. He’s speaking about a fairly simple case: the maximum absolute difference possible between two instances (or two means) on a 5-point scale. In this case, it’s the difference between the mean ratings in two experimental conditions. So the maximum logically possible difference is that between a 1 and 5, or 4, and because a lot of people will select 4 or 5, the maximum reasonable difference is going to be smaller (about 2) on average. That is, the 4’s and 5’s are going to pull the means up sufficiently that it’s unlikely the differences could be anything close to 4 in practice.Report

      • James K in reply to Chris says:

        @chris

        Ah, so it’s just a T-test or Chi-Square test on the difference between the subgroup means. I forget regressions are much less popular outside of Economics.Report

        • Chris in reply to James K says:

          Likely ANOVA, as there were multiple conditions (and t-tests for post hoc). And of course, ANOVA is a special case of regression.

          Actually, political scientists use regression a great deal, but this was a proper experiment, and therefore regression would be logically unnecessary.Report

          • James K in reply to Chris says:

            @chris

            Ah, ANOVA of course.

            I would have considered a regression specification anyway, after all there’s no way the subgroups can be perfectly controlled. Also, there a technique for handling categorical dependant variables (though I’ve never tried it with more than 2 categories) that would make it possible to relax the assumption that (for example) the difference in attitude between points 2 and 3 on the scale was the same as between points 4 and 5. It would also deal with the symmetry of changing opinions at the ends of the scale. It’s bad form statistically to assume normal disturbances in a variable with strictly bounded edges.Report

            • Chris in reply to James K says:

              ANOVA’s ability to be useful even with fairly large deviations from the normality assumption is famous, of course, but I get your point.

              Looking through the paper again, it looks like they just used t-tests for the main comparisons (the results section and supplemental material suck, but they usually do in Science).Report

  8. Pinky says:

    I could not disagree more with your First Law. Do not EVER deny the validity of data because the results are counterintuitive. Research the data like crazy, question every step you made along the way, but don’t dismiss it. You may have found something extraordinary, or you may have stumbled upon a one-shot statistical outlier. Both things happen.

    I’m reminded of a story, that I’m probably going to tell wrong, of a team that was investigating Mount Saint Helens in early 1980. There had been unusual seismic activity in the area, so they brought in cutting-edge laser sensors that could detect ground shifts in the range of fractions of millimeters. The next day they checked the sensors which reported that they’d moved two feet. They assumed that the sensors must not be working right.

    Extreme example, I know. But when you start disregarding evidence that seems wrong, or signing off on data because it seems right, you’re throwing away the value of empirical research.

    There’s no shame in reporting results that can’t be duplicated, either. 5% of all reports with a 95% confidence interval are misleading. You put them out there anyway. We’re all aware of the pressures to publish. You shouldn’t get a job at Princeton because you hit upon a 1-in-20 unusuality, any more than you should lose one for only rolling a 19 on the 20-sided dice. That’s an institutional problem. Institutional problems shouldn’t affect how you analyze data.Report

    • Chris in reply to Pinky says:

      Counterintuitive is the best data, from a scientific perspective, because it shows us something new. Too good to be true is different. In this case, it was huge effects, much larger than such research usually produces, with a finding that defies reasonable explanation.Report

  9. Journals like Science will, and should, continue to have a quick turnaround, resulting in a less rigorous review process, even for improbable results, as long as those results are likely to make a big splash.

    Why the “and should”? I’m no scientist, but from what you’re saying, Science is one of the two premier journals. Why not have a more rigorous review process?Report

    • Chris in reply to Gabriel Conroy says:

      Because they provide a pretty valuable service using that model: solid, early results of broad interest that should spark a lot of follow up, perhaps in multiple disciplines or sub-disciplines.Report

      • Gabriel Conroy in reply to Chris says:

        Got it. It just seems strange coming, as I do, from a discipline (history) where peer review takes so long that articles are vetted and re-vetted so much as to kill their spirit. Or so I’ve heard….I’ve never actually tried to publish a peer reviewed article.

        Maybe it’s in a weird way a function of what Vikram says above. A historian who gets his or her argument wrong is probably not thereby going to kill people, while in the harder sciences the stakes are harder?Report

  10. aaron david says:

    Excellent post @chris

    Do you think this could change the perception of Science as a good journal? Or is this just one of those things that, at this point, just happen?

    While I have a couple scientists in the family, they only published in field specific journals.Report

    • Chris in reply to aaron david says:

      First, thank you. You are all too kind.

      Second, I hope not. While I can’t speak for the harder sciences, everyone in the social and behavioral sciences recognized Science and Nature for what they are. That is, they’re not where you will find comprehensive, detailed, multi-study papers that are perhaps years in the making and meant to cover as many possible objections as possible. Those get published in more specialized journals. Science and Nature are for “sexy” but well-conducted studies with cross-disciplinary appeal. Like this one, if it had panned out.Report

    • aaron david: Do you think this could change the perception of Science as a good journal?

      I can only speak for myself, but my answer is not in the slightest bit. They still have just about the most interesting pieces of any publication.Report

  11. zic says:

    There’s been a lot of problems with research publishing over the years. One of our investment companies found this out, expensively. It had hired a doctor to oversee clinical trials some months before the NYT did an expose on papers written by employees of drug companies but published under names of independent researchers; this doctor was one of those ‘independent authors.’

    We had done seed funding, and needed follow-up funding to keep the company growing; and now, the person we’d hired to oversee clinical trials was in the middle of a huge controversy that brought into question the integrity of the trials he was conducting. His job also included talking to potential investors about those trials; he was very much a public face of the company.

    This is one of the very few times the decision makers in the family took my advice: fire him, halt the trials currently under way, and start over; because the company’s ability to grow and thrive depended on the perception of the integrity of those trials. This proved wise, the economy had just collapsed and investors were jittery; it created enough confidence in the research for follow-up investors to invest.

    The integrity of peer-reviewed research has a lot of implications not just in academia, but it the business world and rubes like me who might hire people who write research papers and conduct clinical trials and do experiments.

    This company is still thriving, has recently presented the results of the new trials we conducted the wake of that debacle, and they’ve been well received. It may, someday make a profit. But I don’t think it would have survived if we hadn’t fired that doctor.

    I’m not convinced he did anything wrong; but he participated in something wrong, and that created the impression of dishonest research — an impression of fraud — and that, I think, would have destroyed the business if we hadn’t fired him.

    Integrity matters.

    Stellar post, @chrisReport

  12. RTod says:

    I could be wrong, but I think TAL just did a segment on this study, that must have been produced prior to the fraud coming out.Report

  13. Oscar Gordon says:

    Excellent post, Chris! Thank you for writing it.Report

  14. Glyph says:

    I know this is late, but I just saw this this AM and it seemed relevant.

    How easy it is to perpetrate then widely-disseminate fraudulent (or probably more often, just plain bad) study results:

    http://io9.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800Report

    • Chris in reply to Glyph says:

      Yeah, this is something I was hinting at in the stats conversation with James earlier in the thread, which is actually an even bigger issue than they’re getting at.

      For those who haven’t read Glyph’s link (which is fascinatingly disturbing), they ran a “real” study (though a really, really small one) and then run a ton of statistical comparisons between the groups, some of which yielded statistically significant results. It turns out that even with fairly rigorous criteria for statistical significance, with multiple comparisons you’re likely to find at least some that are statistically significant just by chance.

      The math for this is pretty simple. They describe it at the link, but I’ll lay it out here as well. Assuming your comparisons are independent (that is, the results of one is not dependent on some aspect of another), the probability of getting a significant result for a particular alpha-level (your pre-determined criteria for statistical significance) with k comparisons is 1-(1-alpha)k. So, for example, if your alpha-level was p < .05 (as is common in social and behavioral science), and you ran 20 comparisons, your probability of a significant result would be 1-(.9520), or .64. That is, with 20 comparisons, and an alpha-level of .05, you have a 64% chance of getting a statistically significant result, which would lead you to reject the null hypothesis, perhaps incorrectly.

      In the story Glyph links, they ran 18 comparisons. Assuming they were independent (since they were health measurements, they weren’t, so the probability of a statistically significant result is actually higher), the probability that they’d find a statistically significant result at the .05 level is 1-.9518, or .603. In other words, they were more likely to find a statistically significant result than to not find one.

      Now, researchers know about this problem, and if they’re honest, they will account for it with various correction techniques, the most common of which in psychology is called the Bonferroni correction. It’s really, really simple: just take your p-value and divide it by the number of comparisons, to maintain an alpha-level of .05 with their 18 comparisons, they’d need to get p-values below (.05/18)=.0028. Peer reviewers will, when multiple comparisons are made, look for such a correction, and studies that don’t use them generally won’t get published.

      However, there’s a more serious problem related to the multiple comparisons problem: while researchers almost always use methods like the Bonferroni correction to adjust their significance criteria in a single study, they rarely, if ever, use them between studies. Since it’s not uncommon for researchers in the social or behavioral science to run the same or similar studies multiple times, the multiple comparisons problem comes into play: if they run the same study 10 times, their probability of finding a statistically significant result at the .05 level is 40%. Since it’s much more likely than the usual 5% that such a result would be obtained by chance, it’s very likely then that a bunch of such findings, which would of course be published, are incorrect. That is, a non-trivial proportion of published social and behavioral scientific findings, particularly if they rely primarily on one published study, are likely bogus.

      Of course, science has a pretty straightforward corrective mechanism for this: replication. However, if one fails to replicate a published finding once or twice, it’s unlikely anyone would publish the results, so you will have to fail to replicate it multiple times. Which, as you now know, raises the multiple comparisons problem.

      This isn’t intractable, and again, there are built in mechanisms for dealing with this. Really, the basic concept of hypothesis testing is a corrective. However, it means that false results will slip through the cracks, and it means that if one really wants a statistically significant result, one can probably find one, and one can probably get it published, and given the way science journalism works today, if one’s fake result is sexy enough, one can get a lot of coverage for it meaning that the result will be part of the public consciousness for years even if it is shown to be false relatively quickly.Report

      • Chris in reply to Chris says:

        Ugh, for some reason it didn’t let me use the Greek symbol for alpha.Report

      • Oscar Gordon in reply to Chris says:

        Chris: and given the way science journalism works today, if one’s fake result is sexy enough, one can get a lot of coverage for it meaning that the result will be part of the public consciousness for years even if it is shown to be false relatively quickly.

        This is the greater problem to be dealt with & is similar to news corrections or retractions in that the corrected information is often buried. Online publications are somewhat better in that corrections can be added directly to the article in question, but that’s only valuable at the source. The outfit that reports on a report, or reblogs on it, may ignore such corrections. Also, if HuffPo reports on some study & 6 months later it’s shown to be flawed, HuffPo might update the original, or pull it, but they certainly won’t run a new article for the front page unless there was a juicy scandal associated with it.

        Toss in our national past time of loving a good conspiracy theory & even if corrections are made & widely publicized, there will be a disturbing number of people who will refuse to discard the original report & assume Big X quashed it for nefarious reasonReport

      • Bert The Turtle in reply to Chris says:

        Green Jelly Beans Cause Acne!

        https://xkcd.com/882/Report