|
Cracking the Zodiac Killer's "340 Cipher" Part 2 |
|
|
|
|
Written by Michael Salsbury
|
|
Friday, 05 May 2006 |
|
As I mentioned in yesterday's article, I am working to crack the "340
cipher" sent to police by the Zodiak Killer, who operated in the 1960s
and 1970s in California. I also mentioned the assumptions I've made
about the message (which could well be wrong) and the staggering size of
the potential solution space. Clearly I needed to shortcut that 100-year
process as much as possible since it's very unlikely I'll live to be 140
to see the end of it.
Some of the shortcuts I can take include:
-
I know that all of the symbols can't translate to the exact same
letter, though it's highly likely that several of them do represent
specific letters. Thus, I can (probably) discard any potential message
key that has "too many" of the same letter. That reduces the size of
the solution space a good bit.
-
I know that when the message is cracked, it's highly likely that there
will be a pretty standard breakdown of the letters as seen in typical
English texts. By spending a minimal amount of time on any key that
generates a "possible solution" of the message whose character
breakdown is too much "off" from that breakdown, I can speed up my
trip through the solution space.
-
I know that when the message is cracked, it should contain a certain
percentage of the most popular bigrams (2-letter combinations) and
trigrams (3-letter combinations) found in English texts. By checking a
possible solution against those percentages, I can avoid wasting time
on "solutions" which are filled with unlikely bigrams (such as "QZ")
and trigrams ("QZQ").
-
I know that the message is most likely written in English, so I can
build a dictionary of the English language from online sources and
compare any possible solution which has the right breakdown of
characters, bigrams, and trigrams against that dictionary to see how
many real words are in the message. The more words we find in the
possible solution we're looking at, the more likely I'll have found
the "right" solution.
-
Based on my analysis of the enciphered message, there are some symbols
that occur too frequently to be likely to be letters like Z, Q, or X.
When trying potential keys, I can discard those which are attempting
to replace those symbols with characters they're unlikely to be.
Not being a mathemetician (I blame the lousy Calculus teachers I had at
Syracuse University for squelching my confidence in my ability to do
math), I have no idea just how much the above will cut down my solution
space. Still, I expect it slices the space down pretty handily overall. To
implement the above rules, I developed a scoring system that requires a
potential solution to pass through a number of "gates" before going on to
an analysis that is more thorough or computer time consuming. The scoring
method works something like thi
-
I generate a possible message key.
-
I attempt to decrypt the message using that key.
-
I run the potential solution through a character counter to verify
that it has approximately the right number of the "most common"
characters. If not, I move on to the next key.
-
I compare the number of individual characters found against their
typical frequencies in English. If there are too many or too few of a
given character than expected, I move on to the next key.
-
I compare the message against the most common bigrams and trigrams
used in English. If those don't occur in approximately the right
proportions, I move on to the next key.
-
I compare the message to a dictionary of 20,000 English words. I
weight the scoring in favor of larger words, and heavily in favor of
words the killer used most frequently (especially those he liked to
misspell). The more words I find and the larger the words are, the
higher the "score" I get for the message.
-
I analyze the percentage of the characters in the solution that are
"swallowed up" by the words I found in the message. The higher the
percentage of "words to overall characters" the more likely this key
is to have broken the message, so the higher the score will be.
In the next installment, I'll talk more about the program's logic to try and accomplish the above.
Related Blogs:
|
|
Last Updated ( Thursday, 04 May 2006 )
|