Search
Enter Keywords:
Home
Cracking the Zodiac Killer's "340 Cipher" Part 2 PDF Print E-mail
User Rating: / 0
PoorBest 
Written by Michael Salsbury   
Friday, 05 May 2006

As I mentioned in yesterday's article, I am working to crack the "340 cipher" sent to police by the Zodiak Killer, who operated in the 1960s and 1970s in California. I also mentioned the assumptions I've made about the message (which could well be wrong) and the staggering size of the potential solution space. Clearly I needed to shortcut that 100-year process as much as possible since it's very unlikely I'll live to be 140 to see the end of it.

Some of the shortcuts I can take include:

  • I know that all of the symbols can't translate to the exact same letter, though it's highly likely that several of them do represent specific letters. Thus, I can (probably) discard any potential message key that has "too many" of the same letter. That reduces the size of the solution space a good bit.

  • I know that when the message is cracked, it's highly likely that there will be a pretty standard breakdown of the letters as seen in typical English texts. By spending a minimal amount of time on any key that generates a "possible solution" of the message whose character breakdown is too much "off" from that breakdown, I can speed up my trip through the solution space.

  • I know that when the message is cracked, it should contain a certain percentage of the most popular bigrams (2-letter combinations) and trigrams (3-letter combinations) found in English texts. By checking a possible solution against those percentages, I can avoid wasting time on "solutions" which are filled with unlikely bigrams (such as "QZ") and trigrams ("QZQ").

  • I know that the message is most likely written in English, so I can build a dictionary of the English language from online sources and compare any possible solution which has the right breakdown of characters, bigrams, and trigrams against that dictionary to see how many real words are in the message. The more words we find in the possible solution we're looking at, the more likely I'll have found the "right" solution.

  • Based on my analysis of the enciphered message, there are some symbols that occur too frequently to be likely to be letters like Z, Q, or X. When trying potential keys, I can discard those which are attempting to replace those symbols with characters they're unlikely to be.
Not being a mathemetician (I blame the lousy Calculus teachers I had at Syracuse University for squelching my confidence in my ability to do math), I have no idea just how much the above will cut down my solution space. Still, I expect it slices the space down pretty handily overall. To implement the above rules, I developed a scoring system that requires a potential solution to pass through a number of "gates" before going on to an analysis that is more thorough or computer time consuming. The scoring method works something like thi
  • I generate a possible message key.

  • I attempt to decrypt the message using that key.

  • I run the potential solution through a character counter to verify that it has approximately the right number of the "most common" characters. If not, I move on to the next key.

  • I compare the number of individual characters found against their typical frequencies in English. If there are too many or too few of a given character than expected, I move on to the next key.

  • I compare the message against the most common bigrams and trigrams used in English. If those don't occur in approximately the right proportions, I move on to the next key.

  • I compare the message to a dictionary of 20,000 English words. I weight the scoring in favor of larger words, and heavily in favor of words the killer used most frequently (especially those he liked to misspell). The more words I find and the larger the words are, the higher the "score" I get for the message.

  • I analyze the percentage of the characters in the solution that are "swallowed up" by the words I found in the message. The higher the percentage of "words to overall characters" the more likely this key is to have broken the message, so the higher the score will be.

In the next installment, I'll talk more about the program's logic to try and accomplish the above.


Related Blogs:

Last Updated ( Thursday, 04 May 2006 )
< Previous   Next >

Main Menu
Home
Blog
Photos
Links
Search
Site Index
Feedback
Administrator
Featured Links
BlogInspiration
SpamToons
Shawn Prince's Blog
Jack Ludwig's Blog
Mike Cramer's Site
Fark
Slashdot
Woot!
Cigar Envy
John Kricfalusi's Blog
CigarBlog 101
Cigars 101 Forum
Sponsored Links


View Site Stats