Let’s Get Empirical!
As I mentioned in my post yesterday, I don’t really have a post’s worth of material on the subjects of guns.
However, while reaching this conclusion, I realised there was a contribution I could make to this topic. Whenever the social effects of guns are debated, studies on the relationship between guns and violent crime are often produced by both sides. On the whole, this is a good thing – policy should be evidence based, unless you’re holding to a strictly deontological position.
I have no special insights to offer on the merits of any given study – crime isn’t a subject I’ve studied very much. But I’m still conversant with the basic empirical methods used to research a question like this, so I thought it might be useful to examine the empirical challenges researchers have to face in investigating an issue like guns and their effect on crime. This will hopefully help people to know whether the authors of any given paper have done their homework.
Before you can even think of researching a question like “What effect do guns have on crime?” You have to clearly define guns and crime. This is less straightforward than it may first appear, for example:
- Do you look at all crimes equally? This means that preventing 2 cases of trespassing or petty theft is cancels out causing 2 murders.
- Or you could look at violent crime only, though this will lead to excluding non-violent offences that still affect people’s quality of life, and it still bundles common assault with murder.
- If you’re feeling fancy, you could build a severity-weighted crime index. Of course this creates a variable that is vulnerable to police “juking” the crime statistics, since it now matters to your analysis how a given crime is classified.
- In practice, crime researchers often focus on homicides for data quality reasons – you can’t juke a dead body, and it is the most severe type of crime. Plus, homicides have the best data.
For guns, and important consideration is how to deal with different types of guns. Is a semi-auto rifle the same as a shotgun the same as a revolver the same as refurbished flintlock (and if not, how are you going to treat them differently)? Also, are we counting guns, or people with guns?
There are no objectively superior measures of crime and guns, it all depends on what you can get data for and what hypothesis you are trying to test. The important thing to ask is “What are the authors actually measuring here?“, because if an article is being cited in support or opposition to a proposed gun control policy, the specific hypothesis matters. A law that would change the composition of guns that are owned, but not the number can’t be tested by looking at the total number of guns. Equally, a measure that would limit how many guns a person can own can’t be evaluated by comparing the proportion of gun owners to crime rates.
Finally, check the authors aren’t comparing different population centres (or the same centre over a long time period) without adjusting for population size. After all, all things being equal a city that is 10 times the size of another will have 10 times the crime, and 10 times as many guns. This is a rookie mistake (or a sign of shenanigans), so you shouldn’t see this problem very often, but still something to look out for.
Signal and Noise
Once you’ve settled on your measures, the next step is to try and work out how much of the change in your dependant variable (the thing you are trying to understand) is correlated with changes in your explanatory variable (the variable you think causes changes in the dependant variable), as opposed to any of the other things that might be affecting it.
Figuring out what causes what is called the attribution problem, and it has been an on-going problem since the dawn of humanity’s search for knowledge. The best solution is the controlled experiment, where you set up two identical situations, change one thing in one of them, and watch what happens. This gives you nice, clean comparative data. However, it is generally very difficult to perform a controlled experiment in the social sciences (or even in some of the physical sciences), which means that attribution has to be determined through less direct methods.
When looking at real world data, rather than experimental data, you cannot simply assume that any change in your dependant variable can be explained by changes in your explanatory variable. For example, consider the following argument:
1 – The UK has a lower murder rate than the US.
2 – The UK has stricter gun controls than the US.
3 – Therefore, introducing stricter gun controls in the US will lower the murder rate.
Point 3 does not follow from points 1 and 2. Aside from the fact that correlation isn’t the same thing as causation, there are any number of other factors (collectively referred to as confounds) that could be affecting the murder rate other than gun control. Here’s a list I came up with after thinking about it for a minute:
- Population density
- Law enforcement approaches
- Income levels
- Levels of social trust
- Cultural views on the acceptability of violence as a dispute-resolution mechanism
- Preferences for owning guns (i.e. it may be that UK gun control does nothing but because people over there don’t want as many guns as Americans do, the result is still less violence)
The only way to sort out this mess of confounding variables is to use statistics to estimate the effect (if any) of all these variables. This means a lot of data (i.e. more than two countries), and looking at all the data at once (not just comparing two variables at a time). If you don’t account for all the confounding variables (or, more realistically, as many as you can), then you will end up giving too much weight to the variables you did put in your model.
Whenever you’re reading a social science study ask yourself “What variables did the authors control for? Are there other variables they could reasonably have controlled for?”
Endogeneity and Causation
The statistical methods I described above will deal with most of your data-related needs. Sure, there are hundreds of modifications one might need to employ to deal with particular quirks with your data, but the essential principle is the same.
There is, however one complication that is harder to deal with – endogeneity. Endogeneity is a situation where your dependant and explanatory variables have a causal effect on each other. For example, if guns cause crime and higher crime drives demand for guns (or the reverse, or any combination of those things) then we have a serious problem. The statistical approaches to data analysis I mentioned above only work when causal arrows run in one direction. When you have endogenous variables, there’s no way to disentangle the two causal effects (A causing B and B causing A), which means that you inevitably end up with an utterly misguided analysis.
There is no mathematical trick to get around this problem, the only solution is to look for an instrument, a change in the independent variable which couldn’t possibly have affected the other. For example, if a pack of ravenous rust monsters descended on the United States and destroyed most of the guns in the country, we could see what happened to crime rates following the Great Rust Monster Incident of 2013 without wondering if it was crime affecting the number of guns (unless the rust monsters were released by some nefarious Dungeons and Dragons themed master criminal, in which case all bets would be off). This approach to analysing endogenous variables has been most notably popularised by Freakonomics. While the approach Levitt and Dubner take to analysing serious phenomena in quirky ways may seem like light-hearted entertainment to the uninitiated, it serves a serious purpose – it is when weird thigns happen that instruments are most likely to be found. And since researchers can’t create an instrument, they have to wait for one to turn up, every anomalous event is a potential research paper waiting to be written. Instrumental variable analysis also helps resolve the problem of causation – since you’ve sorted out which way the causal arrow runs before you even start crunching numbers.
When reading a study ask yourself “Is this variable likely endogenous? If so, have the authors used an instrument?” If they haven’t, then the study is highly suspect.
There is a lot more about this subject I could go into, but things would start to get pretty technical from here on in, and I think these are the primary things a lay audience would be able to spot. So whenever you are reading an empirical study, especially if you agree with the conclusions- remember to ask these questions and you’ll keep out of trouble for the most part:
- What are the authors actually measuring here?
- What variables did the authors control for? Are there other variables they could reasonably have controlled for?
- Is this variable likely endogenous? If so, have the authors used an instrument?