I've not been posting a lot of new articles into the blog lately, and I thought it was about time I explained why. Aside from the fact that things have gotten busier at the office, and at home, I've also been channeling what little energy I do have into a few projects. First is to publicize my spam-inspired cartoon site, next is to publicize my site to help bloggers find ideas, and finally (which is the point of this little missive) to try and crack a very old cipher written by a serial killer about 30 years ago.
The serial killer in question is the Zodiac Killer, who operated in the San Francisco area in the 1960s and 1970s. He killed an unknown number of people, but took credit for double-digit numbers. In spite of the fact that he taunted police by writing letters to them and to the news media, he was never caught. The last communication known to be received from him was a 340-character cipher which has never been cracked (at least it isn't publicly believed to have been cracked). I decided to take a crack at it.
I should begin by stating that I am not a cryptographer or any kind of an expert in the subject. I'm a computer geek, to be sure, but have no special training or background in such things. Regardless, I do have a morbid curiosity to know what this cipher says and what it might reveal about the killer. I'm also very curious to see if I can design and write a program which will crack this cipher.
Having read a bit about cryptography, I know that there is a pretty consistent frequency with which letters appear in English texts. I know that there are also certain pairs of letters which tend to appear together ("bigrams") and certain 3-letter combinations which tend to appear more frequently together ("trigrams"). Cryptanalysts uses this information to help them find the key used to decrypt messages. I've found and made use of this same data in the work I've done thus far.
I began by analyzing the known writings of the Zodiac Killer, verifying that his letter frequencies match typical English letter frequencies (they do), that the bigrams and trigrams in his writing occur approximately the same as in normal English texts (they do), and building a list of his "vocabulary" used in previous messages. Armed this this information, I was fairly confident that if in the future I do crack this cipher, any computer program I write should be able to use standard cryptanalysis tactics to identify a break.
The encoded message in question is referred to by analysts of the Zodiac Killer as "the 340 cipher" because it is written as 20 rows of 17 symbols long (20 x 17 = 340). There appear to be 62 individual symbols and/or letters used in the message. It is likely that the Zodiac used the "extra" symbols to make it harder to identify the most commonly used letters in English (e.g., he may have used 4-5 symbols to represent the letter "E" and the letter "T").
Before I could begin instructing a computer to attack this cipher, I had to make some assumptions about it, which I fully recognize could be completely wrong. Still, I had to start somewhere if I was going to break the thing. My working assumptions at this point are the following:
- The killer's previous ciphers were all simple substitution ciphers (e.g., the killer substituted one letter or symbol for another, and any time he used the same symbol it meant the same letter).
- The killer's previous ciphers are all written in English, and thus this cipher is also in English.
- The cipher contains an actual message and isn't just random scribbling that the killer sent to annoy the police and cryptanalysts.
- When properly deciphered, the message will yield a string of words with no punctuation in them, just like the killer's prior ciphers.
In the next article, I'll discuss the method I used to build a program to try to crack this cipher.