Mean-centering Does Nothing for Multicollinearity!
I called my wife stupid yesterday, and I have yet to take it back and don’t think I will. Let me explain why.
There is a problem called multicollinearity:
Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related.
Wikipedia incorrectly refers to this as a problem “in statistics”. It is a statistics problem in the same way a car crash is a speedometer problem. Multicollinearity is actually a life problem and statistics measures how bad it is.
An example of multicollinearity that we’ve talked about here at OT is the effect of parental income on the incomes of their grown children. Parental income is likely to correlate with several things (the quality of the schools the student is likely to attend, accessibility of books and other resources, supplemental activities available, inherited student IQ, etc.). Without careful analysis, it can be difficult to determine what is actually leading to the children’s high incomes. Most just choose whichever explanations best complement their political worldviews.
I repeat: multicollinearity is not a statistics problem. It is a problem that has existed since before there were numbers. You won’t know whether the rooster crows because of the light or because it becomes warmer until you see whether it crows on a morning when the sunrise is accompanied by the arrival of a cold front. Until then, it could be one or the other waking the rooster.*
One of the things people do is use numbers to measure their problems. For multicollinearity, the standard metric is variance inflation factors (VIFs).
In general, high VIFs indicate you have significant multicollinearity problems while low ones indicate you might not. The measure is not perfect, and despite what you might read out there, there is no cut-off value for what is considered acceptably low.
Then in 1991, Leona Aiken and Stephen West screwed everything up. They weren’t the first. Jacquard, Wan, and Turrisi (1990) made the same screw up. It was the 90s, I guess.
What they observed was that if you take the predictor variables and mean-center them, then the VIFs tend to go down. Mean-centering is where you subtract the average from each of the data points.
Paint the picture in your mind now. You are trying to figure out why kids of rich parents end up with higher incomes. You’ve ruled it down to either being parental IQ inherited by children or parents reading books to their kids more. But almost all the parents you talked to with high IQ also read books to their kids and none of the parents with low IQ read book to their kids, so you can’t confidently say which one it is.
Then, you read Aiken and West and follow their recommendations. You subtract the mean IQ of 100 from each of the IQ scores. (So an IQ of 90 is now -10 and an IQ of 110 is now +10.) Also, you subtract the mean books read per month of 3 from each of the books-read scores. (So reading 0 books per month is now -3 and reading 7 books per month is +4.)
Then you calculate your VIFs and they indeed went way down. Have you solved your multicollinearity problem?
No. Changing the scale of IQ and number of books read didn’t actually give you more or better data. The root problem of your not having any high-IQ parents who didn’t read or any low-IQ parents who did read is still there. All you did was succeed in changing the metric used to measure the problem: the VIFs.
This is the equivalent of trying to reduce the severity of a car accident by switching your speedometer from miles per hour to nautical miles per hour. Your numbers will change to sound acceptably lower, but you are still in exactly the same situation you were in before. Changing the numbers used to describe a problem doesn’t change the problem.
Science has propagated this error far and wide. Aiken and West (1991) now have 25,769 scholarly citations according to Google. That is breathtaking.
Here is a gated paper from the May issue of Strategic Management Journal, which is in my humble opinion the best business journal to actually read. The authors, who are each no doubt talented scholars, nevertheless make the same error as those before them:
We mean centered predictor variables in all the regression models to minimize multicollinearity (Aiken and West, 1991). The variance inflation factors for all independent variables were below the recommended level of 10.
These are smart people doing something stupid in public. They have reified a statistical measure of multicollinearity and mistaken it for actual multicollinearity. They should each feel bad, but they are not close to being unique. I pick on them only because they are recent and in a prestigious journal I have immediate access to.
Contrary to Troublesome Frog’s claim in a comment a couple of years ago, academics reify concepts all the time. It is entirely likely and plausible that an economist may spend so much time and effort and passion thinking about how to increase employment that he has forgotten what employment was supposed to be good for in the first place. That’s exactly the kind of mistake whose likelihood increases with the amount of time and enthusiasm spent working on a problem.
But what do you do about Aiken and West (1991)? Their work has the momentum of a thousand suns. Raj Echambadi and James Hess wrote an article titled quite clearly “Mean-Centering Does Nothing for Moderated Multiple Regression” in the highly regarded and widely read Journal of Marketing Research:
…we show the following: 1) in contrast to Aiken and West’s (1991) suggestion, mean-centering does not improve the accuracy of numerical computation of statistical parameters, 2) it does not change the sampling accuracy of main effects, simple effects, and/or interaction effects (point estimates and standard errors are identical with or 4 without mean-centering), and 3) it does not change overall measures of fit such as R 2 adjusted-R . It does not hurt, but it does not help, not one iota. [Vik: emphasis added]
That was in 2004, and it was to no avail. Aiken and West (1991) shows no sign of slowing down. Correct information in this case doesn’t appear to displace incorrect information.
This should not be surprising. No one will particularly miss LaCour. He didn’t make anyone’s life easier. Aiken and West (1991), in contrast, is a very useful tool in every empiricist’s toolbox. Why would researchers throw away something that helps them get published? For that, Echambadi and Hess would have to publish what to do instead, and it’d have to be at least as easy as Aiken and West’s ineffectual approach.
The dirty secret is that no one is really interested in whether their data really has multicollinearity or not. Why would people invest a lot of time and effort trying to identify problems in their own data? They just want a way to get past all the hoops so they can report their results.
In this way, I differ from Chris, who seems to see great danger in the search for sexy results. This critique, in my view, is difficult to differentiate from “stop looking at problems that matter to people!” He says “it creates perverse incentives for researchers”, but these are the exact same incentives that convince researchers to work on anything meaningful. And what really should consumers of research do? Should we not pay attention to articles that are interesting and have implications for society? That seems as unwise as it is untenable.
What worries me isn’t a march towards sexiness. It’s a march towards publishing irrespective of whether someone actually believes what they are publishing is true or not. We have a system that rewards statistical significance rather than honesty. We do have checkboxes in place that are designed to keep out those seeking to hack their way into a journal. The most famous is the requirement that p values be less than 0.05. Almost everyone, however, seems to regard this as a bureaucratic hoop rather than a way to sort between true and false claims. If you don’t like your results on the first run, add or subtract control variables until you do. If that doesn’t work, there are a dozen other options available.
Similarly, checks for measurement reliability, validity, and multicollinearity are made only because they are a burden to be borne. They are executed by people thinking “how do I pass this validity test?”, not by those thinking “does my data pass this validity test?” If you run into problems, you search for a different test that you can pass or do different manipulations until you get through. In fields where collecting data can take years, not publishing work from a dataset is not an option. In my view, this is a greater threat to science than data fabrication.
* Or an internal clock. Or a temperature change whether it is in a positive or a negative direction. I don’t really know much about roosters.