Archive

Posts Tagged ‘cipher program’

Cracking the Zodiac Killer’s “340 Cipher” Part 3

May 7th, 2006

In the last installment, I told you something about the shortcuts and general flow of execution of the custom Visual Basic program I’ve written to try to crack The Zodiac Killer’s thus-far-unbroken cipher known colloquially as “The 340 Cipher”.  This time around I’ll tell you a bit more about the program itself.

The program makes all of its “deciphering” decisions based on what I’m calling a “gated scoring algorithm” intended to expend the least possible effort determining that a potential “decode” of the message really does look like a decode.  The algorithm works something like this:

  • First, the program looks and counts the frequency of letters in the “decoded” message.  If the letters which appear most commonly in normal English writings appear in approximately the right frequency in the message, it generates a base score.  If that base score is too low (i.e., there are not enough of the most-common letters in the message), scoring stips here before too many processor cycles are used.
  • If the program finds “approximately” the right frequency of letters in the message, it then gets more granular about the letters it’s looking for.  It makes sure that in the decoded message it finds the approximately-correct frequency of As, Bs, Cs, etc.  Each letter that is occurring in approximately the right frequency (+ or – 20% of normal English text) gets a higher score than those which don’t.  Any letter appearing “too often” deducts from the total score (e.g., lots of “Zs” would drop the overall score).  If letters appear in the “decoded” message in approximately the right frequency, scoring continues. Otherwise, it stops here before more cycles are wasted.
  • If the frequency of individual letters looks good, the program looks at the most common bigraphs in the English language and compares these to the message.  If it finds approximately the right percentage, scoring moves on. Otherwise, it stops.
  • If the frequency of bigraphs looks about right, it then looks at a more granular list of bigraphs, trigraphs, and quadgraphs and scores the message based on whether these seem to be appearing in about the right amounts for a normal English text.  If so, the score is increased. If not, it isn’t.  If the score isn’t sufficiently high enough, no more scoring effort is performed.
  • Assuming the breakdown of bigraphs, trigraphs, and quadgraphs is within a reasonable tolerance from “normal” English text, the program then pores through a 20,000-word English dictionary, going from the longest to the shortest words. This dictionary provides a score for each word, with added weight given to those words the killer used often that don’t occur normally in English writing (like the killer’s tendency to misspell “having” as “haveing”).  This part of the scoring process can take several seconds of elapsed time to complete, so it is only done by the program when there is a very good chance of finding lots of English words in the text.

I call this a “gated scoring algorithm” because the potential “decode” of the message must achieve a certain predefined score before it can get through the “gate” into a more time-consuming scoring method.  This method allows the program to “fly” past potential decodes that are worthless (like something that generates nothing but “QZCZQ” type text) and spends the most time on “decodes” that statistically look like English text.

Read more…

admin General Computer Topics , , , , , ,