On the Way to Fraud [UPDATED]
If you will bear with me for a moment, I would like to begin with a story. When I was a grad student, lo these many years ago, I was mildly (read: pathologically) obsessed with a particular theory of embodied cognition. At a general level, the theory is pretty straightforward: bodily actions activate particular concepts, which in turn influence our perception and understanding of the world. The people advocating this theory have specific ideas about some types of actions and the concepts they activate, and I generally found (find) their ideas absurd. So from the first weeks of my first semester of graduate school, I set out to show just how absurd they were.
What this entailed, in short, was taking the action-concept connections seriously and having people perform the actions to see if the concepts were activated. I came up with a lot of absolutely silly experiments involving things like people rolling across the floor in chairs, or going up and down escalators, and then using some measure of conceptual activation. Long story short: none of them worked, because the particular theory of embodiment is as silly as the experiments were.
After a couple years of this I gave up, because it was a waste of time and effort and I wasn’t going to get publications out of it. Let’s face it, after a while that’s what most science graduate students care about. The research program lay dormant for a while, until another grad student heard about it and somehow convinced me to revive it (he will make an excellent used car salesman if his psychology gig doesn’t work out). So we took an idea that I’d ditched when I abandoned the whole project, modified it a bit, and ran with it. Then we got the data, and it was working! Not just working: the effect was huge. I mean, really huge! We talked to our adviser, we talked to some other people, and everyone agreed: this finding was a really, really big deal. The refrain around the lab became, “We’re going to publish this in Science!”
Here’s the thing, though: the results were too good to be true, so that at the same time we were very excited, we knew something was probably not right. So we went over every step of the method, including the MATLAB code that ran the experiment, and to our great disappointment, we discovered that there was an error in the way we were running it, and that, not the hypothesized connection between a particular action and a particular concept, was responsible for the huge effect. When we fixed the code and we re-ran the study, the effect disappeared. Zip, zilch, nada.
We were devastated, not because we had thought we had discovered evidence for a revolutionary view of the human mind (remember, we thought that view was bullshit, because it is), but because we thought we were going to get published in a major journal like Science, which would (combined with our other, lesser publications), make getting a tenure-track position at a major research school much, much easier. In other words, it would have made our careers.
This story should make clear, then, just how much such a publication would mean to graduate students, and what they (and untenured faculty) would give to get their research published in Science or Nature: a kidney, maybe two, their first-born child, and if the Devil doing the peer reviewing, their immortal soul. It is a really, really big deal.
I tell you all of this to shed a bit of light on this week’s revelation of fraud in political science. If you haven’t heard the story, here are the basics (check out FiveThirtyEight and Retraction Watch for more in depth discussions): In December of last year, Michael LaCour, a graduate student in political science at UCLA, and Donald Green, a political science professor at Columbia, published a paper titled “When contact changes minds: An experiment on transmission of support for gay equality” in Science. Here is the paper’s abstract:
Can a single conversation change minds on divisive social issues, such as same-sex marriage? A randomized placebo-controlled trial assessed whether gay (n = 22) or straight (n = 19) messengers were effective at encouraging voters (n = 972) to support same-sex marriage and whether attitude change persisted and spread to others in voters’ social networks. The results, measured by an unrelated panel survey, show that both gay and straight canvassers produced large effects initially, but only gay canvassers’ effects persisted in 3-week, 6-week, and 9-month follow-ups. We also find strong evidence of within-household transmission of opinion change, but only in the wake of conversations with gay canvassers. Contact with gay canvassers further caused substantial change in the ratings of gay men and lesbians more generally. These large, persistent, and contagious effects were confirmed by a follow-up experiment. Contact with minorities coupled with discussion of issues pertinent to them is capable of producing a cascade of opinion change.
The study was pretty simple: two types of survey canvassers, gay and straight, read one of two scripts: a script sharing a story about why gay marriage was important to the canvasser personally, or a script sharing a story about why recycling was important to the canvasser personally. They found that when gay canvassers read the gay marriage story, their impact on the opinions of those surveyed was large and long-lasting, showing up even after 9 months.
The effect was huge, and based on our existing knowledge of influence, virtually inexplicable. Political scientist Andrew Gelman of The Monkey Cage had this to say about the size of the effect soon after it was published:
What stunned me about these results was not just the effect itself—although I agree that it’s interesting in any case—but the size of the observed differences. They’re huge: an immediate effect of 0.4 on a five-point scale and, after nine months, an effect of 0.8.
A difference of 0.8 on a five-point scale . . . wow! You rarely see this sort of thing. Just do the math. On a 1-5 scale, the maximum theoretically possible change would be 4. But, considering that lots of people are already at “4” or “5” on the scale, it’s hard to imagine an average change of more than 2. And that would be massive. So we’re talking about a causal effect that’s a full 40% of what is pretty much the maximum change imaginable. Wow, indeed. And, judging by the small standard errors (again, see the graphs above), these effects are real, not obtained by capitalizing on chance or the statistical significance filter or anything like that.
Gelman does his best to come up with a plausible explanation for the finding in that post, but it’s clear he’s reaching and that he recognizes he’s doing so. At no point does he question the validity of the findings, however. In fact, no one did (I read the paper, as I read any social or behavioral science research in Science or Nature, and I certainly didn’t). The FiveThirtyEight post linked above describes the coverage, and influence, of the paper, all of which was unquestioning:
By describing personal contact as a powerful political tool, the paper influenced many campaigns and activists to shift their approach to emphasize the power of the personal story. The study was featured by Bloomberg, on “This American Life” and in activists’ playbooks, including those used by backers of an Irish constitutional referendum up for a vote Friday that would legalize same-sex marriage.
“How to convince anyone to change their mind on a divisive issue in just 22 minutes — with science,” was one catchy headline on a Business Insider story about the study.
It wasn’t until this month, when a professor and a graduate student set about trying to replicate the study that anyone realized something was up. It turns out it was really easy to see, if anyone had looked. Here is the summary of their findings (which are detailed here):
We report a number of irregularities in the replication dataset posted for LaCour and Green (Science, “When contact changes minds: An experiment on transmission of support for gay equality,” 2014) that jointly suggest the dataset (LaCour 2014) was not collected as described. These irregularities include baseline outcome data that is statistically indistinguishable from a national survey and over-time changes that are unusually small and indistinguishable from perfectly normally distributed noise. Other elements of the dataset are inconsistent with patterns typical in randomized experiments and survey responses and/or inconsistent with the claimed design of the study. A straightforward procedure may generate these anomalies nearly exactly: for both studies reported in the paper, a random sample of the 2012 Cooperative Campaign Analysis Project (CCAP) form the baseline data and normally distributed noise are added to simulate follow-up waves.
They even contacted the company that LaCour claimed he had used to do the surveys, and they had never heard of the project. What does this mean? It means that the experiment was never actually run. The data was produced using an existing data set with some statistically generated extra data to make it look like more than one set (for the first survey and follow-up surveys). In short, the entire thing was fabricated.
The discoverers of the fraud, Broockman and Kalla, contacted Green with their case, and he was quickly convinced by their overwhelming evidence. Green, to his credit, then wrote to Science retracting the paper. In the end, the system worked.
Even so, the social science world, as well as the journalists and lay people who follow that world, are left trying to figure out what went wrong at every stage, from LaCour’s decision to fabricate the data to Green’s failure to notice, from Science‘s peer-review process to the credulous acceptance of the press and other political scientists. I am afraid that the culture and incentives at each of these four stages are such that this will happen again, perhaps often, and in many cases with less publicity will go unnoticed much longer. I will address each of the four stages in order.
The Graduate Student
The first and perhaps most pressing question is what motivated LaCour to commit fraud in the first place? No one can know with any certainty what, precisely, LaCour was thinking, but I think I have a pretty good idea. The experiment, as Green himself notes, was incredibly ambitious for a graduate student, not in its design, which is quite simple, but in its scope, in the time commitment, and its cost. It was so ambitious that Green says he initially rejected the idea, but in the end LaCour convinced him he could pull it off. At some point, LaCour must have realized that he couldn’t. At that point he had two choices. First, he could admit a failure that likely cost him a great deal of time and effort, and would no doubt have resulted in significant delays in his research career, and therefore in his graduate education, and barring major research successes in the future, would have been a significant setback for his academic career as well. Or he could make up the data and hope that he did a good enough job of it that no one would ever notice. He chose the latter.
While I was not privy to the actual process, I suspect that it went something like this: LaCour produced the data and the analyses, and sent it to Green, who immediately recognized the importance of the finding. It was likely Green who suggested that they submit the findings to Science first, and I imagine this caused LaCour a great deal of anxiety because a Science publication means a great deal of attention, and therefore scrutiny, but he had long-since passed the point of no return: if he had told Green that he had fabricated the data, at this point, Green would have contacted LaCour’s graduate adviser, who would have initiated an investigation, which would ultimately have resulted in LaCour being dismissed from the graduate program. From his perspective, he had no choice but to run with the fraud and hope that somehow, some way, no one ever noticed.
And between this time and someone finally, and inevitably, noticing, LaCour benefited greatly from the study. His name was all over the media, his career took off in the form of a position at Princeton upon finishing his PhD, and the respect of many other, more senior researchers who were eager to work with him to replicate and extend his findings. I imagine it was easy to get caught up in this, and perhaps at times he even convinced himself that everything was going to work out OK. At others, I am sure he experienced a great deal of anxiety. At no point, however, could he have done anything about it without ending his career as a political scientist. He was a prisoner of his own ill-gotten success.
Green’s part is a more difficult for me to understand. In interviews (e.g.), he makes it clear that he never saw the raw data, and was never involved in the actual running of the study. As he tells it, he was only involved in some of the more advanced statistical analyses and the write up. His excuse for this? His university’s Institutional Review Board (IRB) had never approved the study.
Why would this matter? IRB review is necessary for any research using human subjects to insure that the research is consistent with ethical guidelines. Without such a review, researchers cannot conduct a study using university resources, and are certainly limited in their involvement with studies conducted elsewhere. How limited? Can they not even look at raw data, as Green seems to be claiming? I cannot imagine this is true. Researchers request data from other researchers all the time, for the purpose of running their own analyses or replication, without having IRB approval for the study that produced that data. Certainly a researcher planning on publishing that data can do so as well? And if not, why would an experienced researcher, upon seeing such a dramatic effect, not go ahead and request approval so that he or she could see the primary data? Due diligence, especially for a finding of such importance, would seem to demand this.
These are questions Green will undoubtedly have to answer when his own department and university investigate this incident. From here, it looks like Green simply placed too much trust in LaCour, trust that was almost certainly augmented by the fact that he immediately recognized the importance of the finding, and the attention it would bring him. Green’s career may not have needed it as much as LaCour’s, but it would definitely be a huge feather in his cap.
First, it is important to understand how Science works. It is one of the two most widely read journals in science generally, along with Nature, both of which share a basic format: short papers (sometimes with more extensive online supplemental materials) on research with potentially broad interest and influence. Unlike most discipline-specific journals, these big multidisciplinary journals also have quick turn arounds: while it is not uncommon for discipline-specific journals to take months, even a year, to review a paper, Science and Nature might go from submission to acceptance or rejection in a matter of weeks. This means that they usually involve fewer reviewers per paper, but also that, given how many submissions they get, they reject almost everything. The few papers that make it through are then published as soon as there is space available.
It should be clear, then, that Science relies on the integrity of the researchers who submit papers to them more than most journals do, because their peer review process cannot be as involved given their swift turnaround time and the number of submissions they receive. They simply do not have the time or the resources to require that reviewers closely check the data, as Broockman and Kalla did when they decided to replicate the study, and as reviewers for other journals may have done.
The implications of this are clear to me: if the name of an experienced researcher like Green had not been on that paper, Science would not have published it, because Science depended on his reputation to supplement its less-stringent review process. In a sense, Green failed Science more than Science itself failed.
Still, the editors and reviewer(s) should have recognized that the results were highly improbable, and made a conscious effort to do more than they usually do when reviewing papers for that publication. Or if they were really responsible, they shouldn’t have published it in the first place, recognizing that it was a study that needed more vetting, and perhaps more follow-up research, before it could be considered believable.
If we, that is other political scientists, journalists, and practitioners, and lay people interested in social science, are being honest with ourselves, our failure is perhaps as great as that at any other stage in this process. These results were, by any standard, unbelievable. That is, everything we knew about opinion and influence suggested that the effects we were reading about were, if not impossible, then at least highly improbable. Many people recognized this, but our trust in the scientific process, along with our excitement at the results, blinded us to the implications of this, and made us forget our responsibility as scientists or lovers of science, to be skeptical of any results, but particularly of improbable ones. That is suppose to be how science works, at every level: the more improbable or counterintuitive the finding, the more scrutiny everyone gives it. In this case, we all failed to scrutinize the finding sufficiently.
I do not mean to suggest that we should have assumed fraud; basic charity demands that we exhaust all other possible explanations before we get to that point. We should all have seen that there were likely other explanations, though: errors in the methods, in the data collection, or the analyses, perhaps, or maybe just a statistical anomaly. We should have waited for replications before we got excited about the implications of the findings, and we certainly shouldn’t have used the results to guide policy or practice. We did’t do any of this, and we have only ourselves to blame.
Will anything change as a result of this case? Likely not. Graduate students (and untenured professors) will still be under a great deal of pressure to produce publications, and this pressure will induce a few to commit fraud. The established researchers working with those few will have little incentive to take the time and effort to sufficiently check the work, and strong incentives to publish wave-making findings. Journals like Science will, and should, continue to have a quick turnaround, resulting in a less rigorous review process, even for improbable results, as long as those results are likely to make a big splash. And everyone else will be wowed by sexy research findings, as we so often are, regardless of how preliminary and implausible they may be. More fraud will occur and be published, and will be lauded until it is discovered. The only positive, at this point, is that in the vast majority of cases, and in all of the most visible ones, someone will catch it and the perpetrators’ careers will be over.
I do think it is possible to prevent most cases of fraud, but this would require working against the incentives already in place. In order to do so, I suggest we follow Chris’ Three Laws of Data:
First Law: If the data is too good to be true, it is not true. No need for “probably.”
Second Law: If your research partner brings you data that is too good to be true, check that shit.
Third Law: If the data is so good that it defies reasonable empirical and/or statistical explanation, see the First Law.
Following these worked for my colleagues and I in the story at the top of the post, and they would have worked for Green, the editors at Science, and the rest of us had we followed them in this case. We didn’t, and that’s on all of us.
UPDATE: Here is LaCour’s response to the allegations, at his website:
Statement: I will supply a definitive response on or before May 29, 2015. I appreciate your patience, as I gather evidence and relevant information, since I only became aware of the allegations about my work on the evening of May 19, 2015, when the not peer-reviewed comments in “Irregularities in LaCour (2014),” were posted publicly online.
I must note, however, that despite what many have printed, Science has not published a retraction of my article with Professor Green. I sent a statement to Science Editor McNutt this evening, providing information as to why I stand by the findings in LaCour & Green (2014). I’ve requested that if Science editor McNutt publishes Professor’s Green’s retraction request, she publish my statement with it.
This is the sort of non-denial denial that I can only assume was drafted by a lawyer.