So You Want To Draw a Cartogram
I’m fond of map-like things. A bit over three years ago, I decided that I wanted to draw a cartogram. Cartograms are those maps where the various shapes have been distorted to represent some variable other than their geographic area. The front-page image for this post is an example that has become standard after Presidential elections. The colors indicate which party’s candidate won in a state and the size of each state represents the number of its electoral votes. The particular cartogram I wanted to draw was one where the size of the states represents the size of the federal land holdings in each. “I encounter these things on a regular basis,” I thought to myself. Then I muttered the five little words that have gotten me into trouble from time to time: “How hard can it be?”
Being a pseudo-academic, the obvious starting point was a literature search. There are a variety of types of cartogram; the one I wanted to draw was a contiguous cartogram. There were quite a collection of algorithms for doing the distortion in contiguous cartograms. Some were faster and some produced “nicer” results. The general consensus seemed to be that the current state of the art was the gas-diffusion pressure-equalizing model developed by Gastner and Newman . There were add-ons implementing that algorithm for the commercial map tools ArcGIS and MapInfo; the household research budget wouldn’t support a license for either one of those. There was an implementation included in a free tool called ScapeToad written in Java; I didn’t care for the user interface and the results. Mark Newman had made an implementation of the core algorithm available, written in vanilla C, which compiled readily on my Mac. “Some Perl wrapped around this might be an answer,” I thought. “How hard can it be?”
Simple digital maps are just a collection of polygons. It’s amazing what kind of odds and ends you can find on Wikipedia. I found an outline map of the 48 contiguous states with the polygon data for each state in a fairly readable form. A bit of Perl code extracted that data and tucked it away in a file that was even easier to read (when in doubt, reinvent the wheel). I found Wikipedia tables with the area of each state. I found tables in various places with the area of the federal land holdings in each state. As it turned out, I needed to draw polygons for more than the obvious reason. There’s a Perl wrapper for the open-source GD graphics library that makes polygon drawing pretty easy . It took a day to figure out how to actually make Dr. Newman’s programs work, a couple of weeks part-time to code up a version of my own stuff, and one afternoon I had a surprisingly good-looking cartogram, similar to the one shown here. (If I’ve done things right, you should be able to right-click “View Image” on any of the images here to get a bigger version.)
There’s an old saying that if the only tool you have is a hammer, all problems look like nails. There should be a corollary that says once you’ve built a hammer, you go looking for nails. The next time I wanted to draw a cartogram it needed county-level data. Doing the same thing as before with county-level outlines from Wikipedia worked… sort of. At that level, there were a fair number of errors in the outlines that I had to find and fix. Putting together maps of arbitrary groups of states meant tolerating some odd orientation effects. Searching for a better set of shapes led to the amazing collection of outline information maintained by the US Census Bureau. But using that meant learning how to handle Esri-standard “shapefiles”. Each shapefile is actually a collection of files, most containing binary data unreadable by humans. Unsurprisingly, there’s a Perl module for extracting shapefile information that makes the task easier.
The Census Bureau’s outlines all use latitude and longitude for their coordinate system. But using latitude and longitude directly creates its own distortions in the size of areas. We all get exposed to the Mercator projection of the world in grade school, in which Greenland looks to be bigger than South America or Africa. Solving that meant learning more about the details of map projections, the art/science of representing a portion of a sphere on a flat piece of paper. That led to discovering the rather remarkable
cs2cs program maintained by the PROJ.4 project that translates between almost any map coordinate systems in common use. I mostly use the Lambert azimuthal equal-area projection, named after 18th-century Swiss mathematician Johann Lambert, who created it.
Handling state names, or postal abbreviations for those, is easy because there are only 49 of them (48 contiguous states and the District of Columbia; Alaska and Hawaii are problematic for contiguous cartograms). There are something over 3,000 county-equivalents in the US, many with similar names. Enter the federal FIPS system, which assigns each state and county a unique number. Most county-level data tables include the FIPS identifier. The two maps above showing Colorado, Kansas, and Nebraska (and a hypothetical rural Great Plains state) reflect the (subtle) improvements all of these things made possible.
By the time I’d finished all that, the question “How hard can it be?” wasn’t really appropriate any more. Adding features from time to time had become a minor obsession. For example, county shapes are often irregular. A rectangular mesh can do a better job than the county outlines for showing relative density of some variable. The figure to the left shows New Mexico and Texas, first as a flat map with an overlaid mesh instead of county outlines, and then as a cartogram based on county population density. Using a mesh provides more lines, as well as more regular lines, and causes even the smaller metro areas to appear to bulge out clearly. This particular cartogram shows at least three things nicely: (a) just how much of the Texas population is concentrated into the state’s metro areas; (b) how much emptier the rural areas of west Texas are than those of east Texas; and (c) that El Paso and its suburbs ought to be the preeminent city of New Mexico, rather than being tacked on to Texas. Doing the mesh “right” required getting a whole bunch of picky little details right.
I have a list of things that need to be done. Building a new cartogram that uses a new subset of states/counties requires multiple steps; I’d like to just hand one data file to one piece of software and have it do all of those steps. Given that, I’ve been thinking about setting up a small website where people could submit a data file of their own and get a cartogram e-mailed back to them. Animations that showed a flat map morphing into a cartogram would be cool. Not particularly useful, but cool. Documentation. A Perl module on CPAN. Maybe rewrite the whole darned thing in Python since that language seems to be displacing Perl in many places.
So, what kind of minor obsessions have you found yourself stuck with?
 Diffusion-based method for producing density equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Natl. Acad. Sci. USA 101, 7499-7504 (2004).
 There are Perl modules for doing an amazing number of things. CPAN, the Comprehensive Perl Archive Network, contains over 150,000 modules written by over 12,000 different authors. Sometimes it’s more of a surprise when there isn’t a Perl module for a particular task.
All images by the author, released here under the Creative Commons Attribution license. “Cartogram by Michael Cain” is a good enough attribution.