A few weeks back, President Obama made remarks about consumer needs for strong encryption versus law enforcement and national security needs for access to private data. I complained at the time that the President simply didn’t get it — there’s no such thing as making the encryption “a little bit” weak. Over the last ten days, it’s been Congress’s turn, and they don’t get it either. On Monday, the House committee held hearings on the problem. The politicians wound up in the same place as the President: the answer is — because the politicians say it must exist — some compromise that gives law enforcement access but is so strong that the other bad guys are kept out. They’re just as wrong as the President was about the existence of such a compromise.
The Senate found a new way to be wrong. Last week, Senators Richard Burr (R-NC) and Dianne Feinstein (D-CA) released the proposed text for their update to the All Writs Act this past week. The AWA dates to 1789 and gives the courts authority to order all actions necessary to their function. The AWA has been in the headlines lately because of a federal court order requiring Apple to assist the FBI in decrypting the contents of an iPhone used by one of the San Bernardino shooters. The new mistake is language that requires tech companies to provide court-ordered data “in an intelligible format,” or provide the necessary technical assistance for the government to obtain the data in such a format. What does intelligible mean? The draft spells it out:
(10) INTELLIGIBLE. — The term “intelligible”, with respect to information or data, means — (A) the information or data has never been encrypted, enciphered, encoded, modulated, or obfuscated; or (B) the information or data has been encrypted, enciphered, encoded, modulated, or obfuscated and then decrypted, deciphered, decoded, demodulated, or deobfuscated to its original form.
The authors have tried to cover a lot of bases, and overreached as a result. There is an implicit requirement that encodings of all sorts be reversible — that the original data can be recovered. While ciphers fit that requirement by design, it’s not necessarily true for other types of encoding. Some of them are “lossy” — data is intentionally discarded (lost) and can’t be recovered.
Image compression is an easy case. A raw grayscale image typically consists of eight bits for each pixel, representing different shades of gray from black to white. The top image on the left is such an image. When encoded according to the JPEG standard, the size of the file representing the image is decreased dramatically. More importantly for this discussion, exactly reversing the encoding in order to recover the original data is not technically possible. The middle image is a reconstruction from the JPEG version that was about one-third the size of the original. To a human — or software mimicking human image processing, such as facial recognition — the differences are nearly indistinguishable. JPEG is designed to preserve the important bits, but not the unimportant ones, from a human visual perspective. But a JPEG-encoded image would certainly seem to be a technical violation of the draft statute: the original form of the data can’t be recovered.
(The bottom image compares the previous two, pixel by pixel. In the comparison, the lighter the pixel the greater the difference in the corresponding pixels in the two images. Most of the pixels are wrong, but in ways that don’t interfere with how humans see things.)
Images aren’t the only place this happens. For a time, “disemvoweling” objectionable comments was a fairly common blogging practice. The blog owner removed all of the vowels from the comment text, a form of obfuscation. “Mike is a silly twit” was replaced by “Mk s slly twt”. The comment was still presented inline, and the content could usually be reconstructed, but that took some effort and most people would simply skip over it instead. This scheme renders both of the phrases “fiery lava” and “fairy love” as “fry lv”. In context, which expansion is correct may be clear. Sometimes it’s not. The token “lv” is particularly difficult here. Alive, lava, lave, leave, levee, live, love, and olive are all possible. As is the case with image encoding, the text obfuscation is lossy and cannot be reversed by a static algorithm. Human judgment is sometimes/often required.
Most frustrating about all of this is the implicit assumption that the bad guys are uniformly stupid. There’s an assumption that a terrorist leader, for example, will communicate in clear text using a cipher. Ciphers are only one method of passing secret information. A codebook, where a phrase is assigned a specific arbitrary meaning, is a different method. Phrases from a codebook are much harder to decode. Suppose that the authorities determine, from context, that the text “fry lv” should be taken as “fiery lava”. What does that phrase correspond to in the codebook? Execute attack plan “L” per previously agreed-upon schedule? Or run like hell, the FBI is on to us?
Expressed another way, citizens are being asked to surrender the benefits of strong encryption on the assumption that the bad guys are amateurs, not pros. And don’t have access to pros. But a different type of bad guy, the ones interested in stealing my data for their own financial gain, already employ pros. Very good pros. Let’s not weaken commercially-available encryption and make it easier for one kind of bad guy just because we’re afraid of a different sort of villain. And in particular, if we’re going to go ahead and outlaw strong ciphers, let’s write laws that do that narrowly instead of potentially entangling a lot of other data manipulations.