April 2007 Archives

Starting Over With the Zodiac 340 Cipher

| No Comments | No TrackBacks

I've gotten a lot of email over the last few months regarding my work at breaking The Zodiac Killer's 340 Cipher.  I'm going to try to share my thought processes from the start, so you will understand what I'm doing and why... hopefully.

Let's start by looking at the 340 Cipher itself.  This image comes from zodiackiller.com, which I recommend checking out if you have an interest in the cipher or the killer:


Since many of the symbols in the message aren't characters you can type on a computer keyboard, I decided to replace each of the 62 unique symbols with a unique number from 1 to 62, as illustrated below:

Having done this, I was ready to "translate" the cipher into something I could easily work with in a computer.

I did this by bringing up a graphic editor on my PC and placing the appropriate number beneath each symbol, as illustrated below:

{mosimage}

{mosimage}

This left me with a numeric version of the cipher.  I used a spreadsheet to determine how many of each symbol appeared in the message.  Then I re-ordered them based on frequency, from most-frequent to least-frequent.  My thinking is that if I focus my efforts on the symbols appearing most in the message, I have the best chance of breaking a large chunk of it in one go.  Here's what I came up with:

SymbolCountRank
19241
20122
5113
50114
11105
16106
23107
36108
51109
40910
3811
6712
21713
31714
37715
7616
8617
13618
15619
26620
28621
29622
30623
9524
10525
14526
17527
18528
22529
33530
34531
38532
54533
55534
1435
4436
25437
27438
32439
39440
41441
42442
44443
47444
48445
2346
43347
46348
52349
53350
57351
60352
12253
24254
35255
45256
49257
56258
58259
61260
62261
59162


With this new "ranking" scheme, I re-translated the message from the original numbering to this:

3546113631216172425553182619627
283121329754372038212223143930
2315581513240192013301829103541
423347161243231756371111146
48441511045572754924124492550
34311169127332327342593663713
29411458545163285919172110185
131964139572971482838101621844
2722151521401169283160614914
3310123217116411734722923133
3211150445465373823522615147
62281211415234264152112429
1061444231291285492813514311
12199281639464522181744511
3125953235549443446361732404331
5821561021413731621391558196
11826118463422191220253018
11302034102082474235265013303
5925272022474524838722333348
361537352832542104074360514511


It was now time to start making the assumptions under which I would try to crack this cipher.

In earlier ciphers, the Zodiac wrote in English.  I assume that this cipher is the same.  In earlier ciphers, he used more than one symbol to represent the same letter.  I'll assume that with 62 symbols in this message he's got at least a couple of alphabets' worth of letters here, maybe even numeric digits.  I'll also assume he's continuing to use the same intentional misspellings he used in earlier messages, like "paradice" instead of "paradise".  And I'll assume that he's using words and phrases he's used before, since we all tend to write in a certain voice that doesn't change much.

My next step was to go through his previous writings on zodiackiller.com and glean from them all the words he used in previous messages.  I could include the rest of the English language, but that will slow my program down and might generate false hits on words he never used.

Looking at the Zodiac's previous writings, I was able to determine how often each letter typically appeared in his writings.  The distribution of A, B, C, etc., was approximately the same as for "normal" English texts so there was no special help here.  However, knowing this would allow me to include a computationally-fast method for rejecting keys that were unlikely to be correct.  For example, if a given potential cipher key generated 200 E's in the finished text, it's probably not correct since we wouldn't expect to find 200 of the 340 cipher symbols to translate to the letter E.  In fact, we'd expect more like about 12% or 40.  By adding up the number of A's, B's, C's, etc., a given key would generate, I could quickly reject a "bad" key without having to decode the text or scan it for dictionary words.  That would save computer time.