Archive

Posts Tagged ‘340 cipher’

Starting Over With the Zodiac 340 Cipher

April 11th, 2007

I’ve gotten a lot of email over the last few months regarding my work at breaking The Zodiac Killer’s 340 Cipher.  I’m going to try to share my thought processes from the start, so you will understand what I’m doing and why… hopefully.

Let’s start by looking at the 340 Cipher itself.  This image comes from zodiackiller.com, which I recommend checking out if you have an interest in the cipher or the killer:

Since many of the symbols in the message aren’t characters you can type on a computer keyboard, I decided to replace each of the 62 unique symbols with a unique number from 1 to 62, as illustrated below:

Having done this, I was ready to “translate” the cipher into something I could easily work with in a computer.

I used a spreadsheet to determine how many of each symbol appeared in the message.  Then I re-ordered them based on frequency, from most-frequent to least-frequent.  My thinking is that if I focus my efforts on the symbols appearing most in the message, I have the best chance of breaking a large chunk of it in one go.  Here’s what I came up with:

Symbol Count Rank
19 24 1
20 12 2
5 11 3
50 11 4
11 10 5
16 10 6
23 10 7
36 10 8
51 10 9
40 9 10
3 8 11
6 7 12
21 7 13
31 7 14
37 7 15
7 6 16
8 6 17
13 6 18
15 6 19
26 6 20
28 6 21
29 6 22
30 6 23
9 5 24
10 5 25
14 5 26
17 5 27
18 5 28
22 5 29
33 5 30
34 5 31
38 5 32
54 5 33
55 5 34
1 4 35
4 4 36
25 4 37
27 4 38
32 4 39
39 4 40
41 4 41
42 4 42
44 4 43
47 4 44
48 4 45
2 3 46
43 3 47
46 3 48
52 3 49
53 3 50
57 3 51
60 3 52
12 2 53
24 2 54
35 2 55
45 2 56
49 2 57
56 2 58
58 2 59
61 2 60
62 2 61
59 1 62

With this new “ranking” scheme, I re-translated the message from the original numbering to this:

35 46 11 36 3 12 16 17 24 25 5 53 18 26 19 6 27
28 3 1 2 13 29 7 54 37 20 38 21 22 23 14 39 30
2 31 55 8 15 1 32 40 19 20 13 30 18 29 10 35 41
42 3 3 47 16 12 43 23 17 56 3 7 1 1 11 14 6
48 44 15 1 10 45 57 27 5 4 9 24 1 24 49 25 50
3 43 11 16 9 12 7 33 23 27 34 25 9 36 6 37 13
29 4 1 14 58 54 51 6 32 8 59 19 17 21 10 18 5
13 19 6 41 39 57 29 7 1 48 28 38 10 1 62 18 44
27 22 15 1 52 1 40 11 6 9 2 8 31 60 61 49 14
33 10 12 32 17 1 16 41 1 7 3 47 22 9 2 31 33
32 1 11 50 4 45 46 5 37 38 2 3 52 26 15 14 7
6 22 8 12 11 41 5 23 4 26 4 15 21 1 24 2 9
10 61 44 42 31 29 1 28 5 4 9 2 8 13 51 43 11
12 19 9 28 16 39 4 6 4 52 21 8 17 4 45 1 1
31 2 59 53 23 55 49 44 34 46 36 17 32 40 4 33 1
5 8 21 56 10 2 14 13 7 3 16 21 39 15 58 19 6
11 8 26 1 18 4 6 34 22 1 9 12 20 2 5 30 18
1 1 30 20 34 10 20 8 24 7 42 35 26 50 13 30 3
5 9 25 27 20 22 47 45 2 48 38 7 2 23 33 34 8
36 15 37 35 28 3 25 42 10 40 7 43 60 5 14 51 1

It was now time to start making the assumptions under which I would try to crack this cipher.

In earlier ciphers, the Zodiac wrote in English.  I assume that this cipher is the same.  In earlier ciphers, he used more than one symbol to represent the same letter.  I’ll assume that with 62 symbols in this message he’s got at least a couple of alphabets’ worth of letters here, maybe even numeric digits.  I’ll also assume he’s
continuing to use the same intentional misspellings he used in earlier messages, like “paradice” instead of
“paradise”.  And I’ll assume that he’s using words and phrases he’s used before, since we all tend to write in a certain voice that doesn’t change much.

My next step was to go through his previous writings on zodiackiller.com and glean from them all the words he used in previous messages.  I could include the rest of the English language, but that will slow my program down and might generate false hits on words he never used.

Looking at the Zodiac’s previous writings, I was able to determine how often each letter typically appeared in his writings.  The distribution of A, B, C, etc., was approximately the same as for “normal” English texts so there was no special help here.  However, knowing this would allow me to include a computationally-fast method for rejecting keys that were unlikely to be correct.  For example, if a given potential cipher key generated 200 E’s in the finished text, it’s probably not correct since we wouldn’t expect to find 200 of the 340 cipher symbols to translate to the letter E.  In fact, we’d expect more like about 12% or 40.  By adding up the number of A’s, B’s, C’s, etc., a given key would generate, I could quickly reject a “bad” key without having to decode the text or scan it for dictionary words.  That would save computer time.

admin Life , , , ,

Cracking the Zodiac Killer’s “340 Cipher” Part 4

May 22nd, 2006

An interesting thing happened today.  I was playing the Da Vinci
Code Quest for Eurostar when I reached a puzzle that was based on
ciphers.  The puzzle itself wasn’t that exciting, but look at the
artwork they used in the background behind the puzzle:

Those seemingly cryptic
symbols at the top of the image above should look familiar to you. 
If they don’t, consider the following image from zodiackiller.com of the
infamous “340 Cipher”:

The topmost visible line in the Eurostar
page is approximately the tenth line down in the 340
cipher.

I found it interesting, if not a bit
grisly, that the Eurostar folks are using a serial killer’s cipher as an
illustration for their contest.  Wonder what the Da Vinci Code
folks would think if they knew?


Read more…

admin Life , , , ,

Cracking the Zodiac Killer’s “340 Cipher” Part 3

May 7th, 2006

In the last installment, I told you something about the shortcuts and general flow of execution of the custom Visual Basic program I’ve written to try to crack The Zodiac Killer’s thus-far-unbroken cipher known colloquially as “The 340 Cipher”.  This time around I’ll tell you a bit more about the program itself.

The program makes all of its “deciphering” decisions based on what I’m calling a “gated scoring algorithm” intended to expend the least possible effort determining that a potential “decode” of the message really does look like a decode.  The algorithm works something like this:

  • First, the program looks and counts the frequency of letters in the “decoded” message.  If the letters which appear most commonly in normal English writings appear in approximately the right frequency in the message, it generates a base score.  If that base score is too low (i.e., there are not enough of the most-common letters in the message), scoring stips here before too many processor cycles are used.
  • If the program finds “approximately” the right frequency of letters in the message, it then gets more granular about the letters it’s looking for.  It makes sure that in the decoded message it finds the approximately-correct frequency of As, Bs, Cs, etc.  Each letter that is occurring in approximately the right frequency (+ or – 20% of normal English text) gets a higher score than those which don’t.  Any letter appearing “too often” deducts from the total score (e.g., lots of “Zs” would drop the overall score).  If letters appear in the “decoded” message in approximately the right frequency, scoring continues. Otherwise, it stops here before more cycles are wasted.
  • If the frequency of individual letters looks good, the program looks at the most common bigraphs in the English language and compares these to the message.  If it finds approximately the right percentage, scoring moves on. Otherwise, it stops.
  • If the frequency of bigraphs looks about right, it then looks at a more granular list of bigraphs, trigraphs, and quadgraphs and scores the message based on whether these seem to be appearing in about the right amounts for a normal English text.  If so, the score is increased. If not, it isn’t.  If the score isn’t sufficiently high enough, no more scoring effort is performed.
  • Assuming the breakdown of bigraphs, trigraphs, and quadgraphs is within a reasonable tolerance from “normal” English text, the program then pores through a 20,000-word English dictionary, going from the longest to the shortest words. This dictionary provides a score for each word, with added weight given to those words the killer used often that don’t occur normally in English writing (like the killer’s tendency to misspell “having” as “haveing”).  This part of the scoring process can take several seconds of elapsed time to complete, so it is only done by the program when there is a very good chance of finding lots of English words in the text.

I call this a “gated scoring algorithm” because the potential “decode” of the message must achieve a certain predefined score before it can get through the “gate” into a more time-consuming scoring method.  This method allows the program to “fly” past potential decodes that are worthless (like something that generates nothing but “QZCZQ” type text) and spends the most time on “decodes” that statistically look like English text.

Read more…

admin General Computer Topics , , , , , ,

Cracking the Zodiac Killer’s “340 Cipher” Part 2

May 5th, 2006

As I mentioned in yesterday’s article, I am working to crack the “340 cipher” sent to police by the Zodiak Killer, who operated in the 1960s and 1970s in California. I also mentioned the assumptions I’ve made about the message (which could well be wrong) and the staggering size of the potential solution space. Clearly I needed to shortcut that 100-year process as much as possible since it’s very unlikely I’ll live to be 140 to see the end of it.

Some of the shortcuts I can take include:

  • I know that all of the symbols can’t translate to the exact same letter, though it’s highly likely that several of them do represent specific letters. Thus, I can (probably) discard any potential message key that has “too many” of the same letter. That reduces the size of the solution space a good bit.
  • I know that when the message is cracked, it’s highly likely that there will be a pretty standard breakdown of the letters as seen in typical English texts. By spending a minimal amount of time on any key that generates a “possible solution” of the message whose character breakdown is too much “off” from that breakdown, I can speed up my trip through the solution space.
  • I know that when the message is cracked, it should contain a certain percentage of the most popular bigrams (2-letter combinations) and trigrams (3-letter combinations) found in English texts. By checking a possible solution against those percentages, I can avoid wasting time on “solutions” which are filled with unlikely bigrams (such as “QZ”) and trigrams (”QZQ”).
  • I know that the message is most likely written in English, so I can build a dictionary of the English language from online sources and compare any possible solution which has the right breakdown of characters, bigrams, and trigrams against that dictionary to see how many real words are in the message. The more words we find in the possible solution we’re looking at, the more likely I’ll have found the “right” solution.
  • Based on my analysis of the enciphered message, there are some symbols that occur too frequently to be likely to be letters like Z, Q, or X. When trying potential keys, I can discard those which are attempting to replace those symbols with characters they’re unlikely to be.

Not being a mathemetician (I blame the lousy Calculus teachers I had at Syracuse University for squelching my confidence in my ability to do math), I have no idea just how much the above will cut down my solution space. Still, I expect it slices the space down pretty handily overall. To implement the above rules, I developed a scoring system that requires a potential solution to pass through a number of “gates” before going on to an analysis that is more thorough or computer time consuming. The scoring method works something like this:

  • I generate a possible message key.
  • I attempt to decrypt the message using that key.
  • I run the potential solution through a character counter to verify that it has approximately the right number of the “most common” characters. If not, I move on to the next key.
  • I compare the number of individual characters found against their typical frequencies in English. If there are too many or too few of a given character than expected, I move on to the next key.
  • I compare the message against the most common bigrams and trigrams used in English. If those don’t occur in approximately the right proportions, I move on to the next key.
  • I compare the message to a dictionary of 20,000 English words. I weight the scoring in favor of larger words, and heavily in favor of words the killer used most frequently (especially those he liked to misspell). The more words I find and the larger the words are, the higher the “score” I get for the message.
  • I analyze the percentage of the characters in the solution that are “swallowed up” by the words I found in the message. The higher the percentage of “words to overall characters” the more likely this key is to have broken the message, so the higher the score will be.

In the next installment, I’ll talk more about the program’s logic to try and accomplish the above.

admin General Computer Topics , , ,

Cracking the Zodiac Killer’s “340 Cipher”

May 4th, 2006

I’ve not been posting a lot of new articles into the blog lately, and I thought it was about time I explained why. Aside from the fact that things have gotten busier at the office, and at home, I’ve also been channeling what little energy I do have into a few projects. First is to publicize my spam-inspired cartoon site, next is to publicize my site to help bloggers find ideas, and finally (which is the point of this little missive) to try and crack a very old cipher written by a serial killer about 30 years ago.

The serial killer in question is the Zodiac Killer, who operated in the San Francisco area in the 1960s and 1970s. He killed an unknown number of people, but took credit for double-digit numbers. In spite of the fact that he taunted police by writing letters to them and to the news media, he was never caught.   The last communication known to be received from him was a 340-character cipher which has never been cracked (at least it isn’t publicly believed to have been cracked). I decided to take a crack at it.

I should begin by stating that I am not a cryptographer or any kind of an expert in the subject. I’m a computer geek, to be sure, but have no special training or background in such things. Regardless, I do have a morbid curiosity to know what this cipher says and what it might reveal about the killer. I’m also very curious to see if I can design and write a program which will crack this cipher.

Having read a bit about cryptography, I know that there is a pretty consistent frequency with which letters appear in English texts. I know that there are also certain pairs of letters which tend to appear together (”bigrams”) and certain 3-letter combinations which tend to appear more frequently together (”trigrams”). Cryptanalysts uses this information to help them find the key used to decrypt messages. I’ve found and made use of this same data in the work I’ve done thus far.

I began by analyzing the known writings of the Zodiac Killer, verifying that his letter frequencies match typical English letter frequencies (they do), that the bigrams and trigrams in his writing occur approximately the same as in normal English texts (they do), and building a list of his “vocabulary” used in previous messages. Armed this this information, I was fairly confident that if in the future I do crack this cipher, any computer program I write should be able to use standard cryptanalysis tactics to identify a break.

The encoded message in question is referred to by analysts of the Zodiac Killer as “the 340 cipher” because it is written as 20 rows of 17 symbols long (20 x 17 = 340). There appear to be 62 individual symbols and/or letters used in the message. It is likely that the Zodiac used the “extra” symbols to make it harder to identify the most commonly used letters in English (e.g., he may have used 4-5 symbols to represent the letter “E” and the letter “T”).

Before I could begin instructing a computer to attack this cipher, I had to make some assumptions about it, which I fully recognize could be completely wrong. Still, I had to start somewhere if  I was going to break the thing. My working assumptions at this point are the following:

    • The killer’s previous ciphers were all simple substitution ciphers (e.g., the killer substituted one letter or symbol for another, and any time he used the same symbol it meant the same letter).
    • The killer’s previous ciphers are all written in English, and thus this cipher is also in English.
    • The cipher contains an actual message and isn’t just random scribbling that the killer sent to annoy the police and cryptanalysts.
    • When properly deciphered, the message will yield a string of words with no punctuation in them, just like the killer’s prior ciphers.

In the next article, I’ll discuss the method I used to build a program to try to crack this cipher.

admin General Computer Topics , , ,