Swedes crack Zodiac’s map code?

Crypto enthusiast Nick Pelling recently posted this interesting article about two Swedish engineers who developed a theory about the Zodiac’s unsolved 32-character “map code”.

I agree with Nick’s assessment that the theory is ridiculously convoluted, and that it cannot possibly be correct.

The map code is the most difficult of Zodiac’s cryptograms to solve, because only three of its symbols are repeated. With so many unique symbols, numerous solutions can fit into those 32 characters. By itself, a solution to the cryptogram is impossible to validate. But maybe someone can discover some indisputable connection between the map and the code. Keep trying!

2013 Cryptologic History Symposium

I had the pleasure of attending this year’s Cryptologic History Symposium, held on Oct 17 and 18 in Laurel, Maryland. The event is organized every two years by the non-profit foundation that supports the NSA’s National Cryptologic Museum.

Crypto conference schwag

The proceedings were filled with dozens of fascinating talks by noteworthy professionals in the world of cryptology, and I met several interesting and colorful personalities. Here are some highlights.

NSA Deputy Director John C. Inglis kicked off the symposium with his keynote talk, in which he spent a lot of time defending the NSA in light of the numerous unauthorized disclosures by Edward Snowden of the NSA’s secret and controversal surveillance programs.

NSA Deputy Director John C. Inglis

York University professor Craig Bauer told a fascinating tale about famed computer scientist Alan Turing’s work on SIGSALY, a telephone voice scrambling machine developed during World War II. Created before the digital age, the machine was humongous and required carefully protected phonograph disks to store the cryptographic keys consisting of random noise.

Craig Bauer giving his talk
SIGSALY, a World War II era speech encryption machine

I spoke with Dr. Bauer and learned that he had included the Zodiac Killer in an interesting talk he gave about famous unsolved codes and ciphers. He’s also working on an upcoming book about this topic. Videos of his talk are available on Youtube:

Part 1:

Part 2:

Click this link to go directly to the Zodiac portion of his talk which starts 12 minutes and 32 seconds into the video. He gives a brief summary of the case, and touches on Gareth Penn’s radian theory and the New York Zodiac copycat killer Heriberto Seda.

Dr. Bauer talks about the Zodiac ciphers

He even gave Heriberto Seda’s cryptogram as an exercise for students of York’s cryptology course.

Can you crack it? Click the image to download the exercises, which include many other interesting ciphers.

At the conference I also met German cryptology author Klaus Schmeh, who writes an interesting German-language blog about various cryptologic mysteries, including the Zodiac ciphers. His book Nicht Zu Knacken (which I wrote about here) has a chapter devoted to the Zodiac ciphers. At the conference, Klaus gave a talk about compiling a list of 32 encrypted books. He showed some examples, ranging from obscure encrypted texts, to more well-known books such as the Voynich Manuscript and Codex Rohonci. He even brought his own hand-made reproduction of the Codex Rohonci as a prop for people to look through.

Video of Klaus’ talk (Part 1):

Video of Klaus’ talk (Part 2):

(The volume is faint, so you’ll need to crank up the volume on your speakers)

Video game developer, cryptology enthusiast and Kryptos expert Elonka Dunin gave a talk about the use of cryptology in the recreational activity of Geocaching. I was surprised to learn that many caches require solving cryptograms and other puzzles before you can learn the location of the caches. She told me that some of the puzzles remain unsolved. You can view them here. Some puzzles are even based on the German Enigma machine.

Geocache cryptogram, courtesy of “Oceana Abel” on DeviantArt.

Dr. Todd Mateer spoke about his cryptanalysis of one of the Beale ciphers. In the original document, James B. Ward’s The Beale Papers, each number in the solved cipher corresponds to a word from the Declaration of Independence. Mateer tried to reconstruct the solution based on the procedure outlined in Ward’s paper. He found that he ran in to trouble due to the numerous variations of the Declaration of Independence that were published in Ward’s day. He also found that there are mistakes in the encipherment, such as duplicate numberings in the version of the DOI in Ward’s paper. How is it that the so-called Beale cipher uses the exact same DOI variant, and the exact same encipherment mistakes, as that of Ward’s decipherment? What are the chances that Beale and Ward made precisely the same mistakes? Mateer’s conclusion is that it’s because the Beale treasure is likely to be a hoax, invented by whomever authored The Beale Papers. Nick Pelling offers the alternate view that the oddities found in the unsolved Beale ciphers reflect a change in the encipherment procedure that causes bits of an encrypted key to show through. Some experiments in this direction might help clarify his idea.

The talk with the most attention-grabbing title was Shame, Sex and Alcohol: Ciphers in the Context of Everyday Practices of Secrecy in Early Modern Times, about encrypted messages left by noteworthy Hungarian historical figures in diaries and correspondences. Here is one of the more colorful excerpts:

Their secrets live on.

The conference was very educational, provoking a thirst for more knowledge and discovery. I don’t want to miss the next one in 2015!

Try, try again

From an upcoming natural language process conference in Seattle comes this paper from UC Berkeley: Decipherment with a Million Random Restarts.

In the paper, Berk-Kirkpatrick and Klein investigate the effectiveness of the “expectation-maximization algorithm” (EM) when it is used to search for solutions to homophonic ciphers. EM explores cipher solutions by using a statistical model of the English language. The paper describes EM getting stuck in local optima, a common problem with these kinds of automatic decipherment algorithms.

To understand the local optima problem, think of a rugged landscape. Your task is to find the largest mountain in the landscape, but you can’t see more than a few yards ahead of you. You walk around, thinking that your best bet is to find the steepest slope and follow it, You get to the top of a hill, thinking you’ve found the tallest mountain, but you’re only standing atop a medium-sized one. How can you find the tallest one?

For the EM algorithm, the answer is to randomly restart the search many times. In the rugged landscape, this translates to randomly and blindly teleporting you somewhere else in the landscape. Eventually, one of the slopes you follow will be the right one.

Zkdecrypto uses a similar approach. At each step, zkdecrypto is making small changes to a solution key, and keeping the changes that make the plain text look a little bit more like English. But it, too, might get stuck on a “small hill”: no matter what changes it makes to the key, the plain text doesn’t improve, and the real one remains beyond its reach until enough restarts are done.

Long story short: If at first you don’t succeed, try, try again.

The paper tries to answer this question: How many restarts are needed to get the right solution? The researchers’ answer is that it depends on the cipher. They looked at the Zodiac’s 408, which turns out to be an easy cipher to solve: Accurate solutions are discovered without more than a hundred random restarts. Then the researchers created their own 340-character cipher, and accurate solutions were found using hundreds of thousands of restarts. They distributed the numerous restarts to the 512 computational cores of a powerful computer graphics card, an approach that is thousands of times faster than running the restarts one at a time in sequence. The increased difficulty in solving the test cipher is attributed to its smaller length, and its larger number of cipher symbols. But even with only a few hundred restarts, more than half of the correct solution to their test cipher is discovered by the program, which is usually enough for a human solver to take over and complete the solution.

The researchers unfortunately repeat the claim that in 2011, Ravi and Knight were the first to crack the 408 using completely automatic methods. Zkdecrypto has been automatically cracking the 408 since 2006.

Finally, Berk-Kirkpatrick and Klein investigated Zodiac’s 340, and mention the possibility that it may not be a homophonic cipher, or even a valid cipher whatsoever. No one even knows its proper reading order. This paper is the first I’ve seen that presents specific evidence to support the argument that the 340 is not a homophonic cipher in a normal reading order. First, the researchers generated 100 more test ciphers that are similar to the 340, and ran their EM algorithm on them with 10,000 random restarts. The result was an average accuracy of 75%, where only two ciphers had less than 51% accuracy. If the real 340 is “normal”, at least part of its message would surely have been revealed by now. When they ran their EM algorithm on the real 340, the best results were still nonsensical. They also found that the search landscape looks much different for the real 340 than it does for their test ciphers, suggesting a strong difference in its construction.

I’m curious about how closely their test ciphers resemble the real 340. Do they contain similar cyclic sequences of homophones? Do they have the weird pivots? What about spelling errors and encipherment errors, which were abundant in the 408? At this point, it seems best to try to guess what Zodiac may have done to make the cipher unsolvable. Then we can generate more test ciphers and automate their solutions. If the real 340 is constructed the same as the test ciphers, then its solution, too, will emerge.

Then again, perhaps we should heed W.C. Fields’ advice:

If at first you don’t succeed, try, try again. Then quit. There’s no point in being a damn fool about it.

BTK word puzzle analysis

Below is contributor Chris Klein’s interesting analysis of the BTK word puzzle. He breaks down the overall puzzle into three smaller puzzles that are suggested by the appearance of clean breaks between major words and the placements of the letter “X”.

Puzzle 1:


Notes:

  • This puzzle lacks numbers.
  • The locations of the red X‘s suggest puzzle boundaries.

Puzzle 2:


Notes:

  • The locations of the red X‘s suggest puzzle boundaries.
  • In row 1 column 7 there appears to be a missing number. The letters R, A and D are near this missing number.

Puzzle 3:


Notes:

  • The locations of the red X‘s suggest puzzle boundaries.
  • In row 11 column 3 there appears to be a missing number. The letters E and R are near this missing number.
  • This puzzle lacks numbers.
  • Very many of this puzzle’s letters are used in words.

I believe that the X‘s are some type of directions for the orientation of the three puzzles.

Notes about each puzzle and final summary.

Summary:

Each of the puzzles has its own theme with regard to the hidden words within. The theme of each puzzle is as follows:

  • Puzzle 1 ‐ Recon
  • Puzzle 2 ‐ Location
  • Puzzle 3 ‐ Method of operation
  • Puzzle 4 ‐ Identity?

In each puzzle I did the following:

  • Isolate all obvious words by color (yellow).
  • Isolate all sequential and repetitive keystrokes that were not part of an obvious word by color (green).
  • Isolate all “X”s that were non‐sequential by color (red). (I was drawn to isolate the “X”s separately because of the somewhat uniform location of the “X”s throughout the puzzles.
  • Isolate all remaining, apparently random, keystrokes by color(purple).
  • Isolate all numbers by color (blue). (I took some liberty here. In the “original” document that I worked from, the numbers were oddly spaced so that they were near adjacent alphabetic keystrokes. I noticed that for every oddly placed number there was a blank space to its left. I shifted the numbers to the blank spaces to the left. After doing this I noticed there were 2 blank spaces left on the chart. Interestingly the blank space in puzzle 2 is surrounded by the letters R, A, D. The blank space in puzzle 3 is preceded by E, R.
  • In looking at the progression of the 3 puzzles it seems as if the creator became more efficient as he completed each puzzle. Puzzle 1 seems to have a lot of non‐sequential letters that don’t appear to be part of any hidden words whereas puzzle 3 has very few non‐sequential letters that aren’t part of any obvious words. One question I would ask is why the creator didn’t use sequential letters in puzzle 3 in certain places. Maybe there is no reason but there are only (5) non‐ sequential letters that aren’t part of any obvious word and it seems it would have been easy enough to replace those with some other letter.
  • I did not go into the meanings of the different numbers on the chart.
  • Puzzle 1 ‐ Recon

    This puzzle contains no numbers and seems to be pretty uncomplicated.

    Puzzle 2 ‐ Location

    This puzzle seems to be uncomplicated at first but unlike the previous puzzle it contains sets of numbers. It also contains 3 “X”s on or near the same locations as puzzle 1. This puzzle also contains a blank space adjacent to the letters R,A,D.

    Puzzle 3 ‐ Method of operation

    This puzzle appears to be the most detailed of the 3 puzzles. Unlike the previous puzzle there are no numbers but there is a blank space again. This blank space is preceded by the letters E, R. It also contains 3 “X”s on or near the same locations as puzzles 1 and 2.

    Puzzle 4 ‐ Identity?

    Not sure if one would say that all of these puzzles are just parts on 1 main puzzle but I like to consider them separately. That said, I believe the RADER name is announced by the blank space in puzzle 2 and the blank space in puzzle 3. If that were true then I would consider the grouping of puzzles 2&3 to be a separate puzzle and have labeled it as such.
    Anyhow, thats what I have.

Thanks, Chris!

New Scientist

The May 21, 2011 issue of New Scientist ran a feature called “Uncrackable Codes”, which featured MacGregor Campbell’s summaries of eight famous unsolved mysteries: Somerton Man, Beale’s buried treasure, the MIT time-lock puzzle, Kryptos, the Voynich Manuscript, Enigma, Elgar’s unread message, and the Zodiac Killer.

The Zodiac feature doesn’t have any new details, but here are some highlights:

  • “In November 1969, Zodiac sent a code to the local papers that law-enforcers still believe could hold the key to solving the case.” This seems like wishful thinking. Cracking the 408 didn’t catch the killer. But maybe this wish will come true.
  • FBI crypto chief Dan Olson says the 340 “is number one on his unit’s internal ‘top 10’ list of unsolved codes”, and that that he gets about 20 to 30 submissions every year from the public. None have led to breakthroughs.
  • FBI cryptanalysts believe the 340 contains a real message, since the distribution of characters in the rows is not equal to the distribution of characters in the columns. Olson describes this in more detail in the emails he sent to Tom Voigt back in 2009.
  • In 2009, computer scientist Ryan Garlick led his students in an attempt to use genetic algorithms to crack the cryptograms. The attempt was successful for the 408, but not the 340.
  • Another attempt at San Jose State University also failed to produce a solution for the 340.
  • Garlick thinks to crack the 340, its symbols need to be rearranged somehow. But he says figuring out the rearrangement is very difficult: “You have to happen upon exactly the right thing before any of our software tools would even get close.”

These kinds of “Top Unsolved Codes” lists appear from time to time, and usually only contain the briefest summaries of the mysteries. The MIT time-lock puzzle is one I haven’t seen before. And it was nice to see in the Zodiac feature a summary of academic attempts to crack the codes. How many other places in academia have been working on the dusty old Zodiac ciphers? The ones that I’ve come across include:

All the research seems to be centered on attacking the 340 as if it is a simple substitution cipher. Most papers report succeeding at breaking the 408 and failing with the 340. I’ve yet to see any academic research into the idea of rearranging the 340’s symbols, or exploring its other qualities that might offer clues into how it is truly constructed. This seems to point to falsifying the hypothesis that the 340’s plain text is written in valid English, arranged in a normal direction, and enciphered using straightforward homophonic substitution. My guess is that exploring all the strange variations that are possible would result in tools that are too specific to apply to other “pen and paper” style ciphers. A lot of work for potentially zero reward. Who’s up to the challenge?

Mountain of evidence?

Audrey Cooper, managing editor of the San Francisco Chronicle, recently tweeted this:


The thickness of that stack of papers suggests lengthy and convoluted attempts to justify the claimed solution. It’d be interesting to know what approach the solver took. In the photo, the letter starts by discussing the cycling of variants (also known as homophones) in the 408:

The above cipher variants were, for the most part, made in a cyclic order, deteriorating toward the end of the third part of the message

The letter shows the key, with all the variants (homophones) displayed beneath each plain text letter:

Then there’s a breakdown of the key, showing normal-looking alphabetic cipher symbols first, followed by the symbols that look like backwards letters, and then the symbols that look like shapes and punctuation:

The letter writer points out the interesting fact that “LMN” happens to decode to “THE”.

So far, the letter is off to a reasonable and logical start. I wonder where it leads.

The file of solution claims at the Chronicle must be massive. Reporter Kevin Fagan once told me they still get about two submissions per week. It’d be interesting to look at the file to see all the different approaches people have taken, and to find out how many of them fall into the usual traps, and if any of them supply new and useful ideas.

My Name Is… Sarah The Horse?

The 13-character cryptogram mailed by Zodiac on April 20, 1970 to the San Francisco Chronicle remains unsolved, despite many attempts to find solutions that fit into the cipher text.

Did Zodiac really encipher a name in this letter? If we assume that the cipher is a simple substitution cipher, read from left to right, then it is difficult to find real names that fit. This is because only 8 of the 13 symbols are unique. The repeated symbols are:

  • : Occurs 3 times.
  • : Occurs 2 times.
  • : Occurs 2 times.
  • : Occurs 2 times.

Assuming that each of those symbols can only represent one plaintext letter, many names we try to plug in will fail, because they will violate the cipher’s constraints.

We can roughly estimate the probability of 13 characters of plain text fitting into the cipher text. First, we start with these event probabilities:

  • Probability of two letters repeating in English (p2): About 0.0655.
  • Probability of three letters repeating in English (p3): About 0.00535.

Since we must have three identical letters, AND three sets of identical pairs, we must multiply the probabailities of each event together:

p3 * (p2)3 = 1.5 x 10-6, or about 1 in 670,000

Another way to look at this is to consider all possible 13-letter plain texts. Each position can be one of 26 letters, so that means for 13 spots there are 2613 possible plain texts (about two quintillion). But there are only 8 different symbols in the cipher, which means there are really only 268 possible plain texts (about 200 billion). That means that only about 1 in 12,000,000 plain texts will fit into the cipher without violating its constraints.

I found a database of a hundred million person names and wrote a program that scans them to look for names that fit in the cipher. When examining a name, the program allows the first, middle, and last names to appear in different orders, and allows any of the name parts to be abbreviated with initials. It discards anything that isn’t exactly 13 letters long. This results in almost a half billion tests of name combinations. The algorithm found only 213 plain texts that fit the cipher text. That works out to about 1 hit for every 2,000,000 attempts. Here is a sampling of the more interesting names that fit:
Continue reading

BTK word search

Like Zodiac, serial killer Dennis Rader, also known as BTK, taunted police and newspapers with letters boasting of his crimes.

Among the many correspondences was a word puzzle Rader sent to Wichita television station KAKE on May 5 of 2004 [1] [2], shown here:

It was not difficult for people to find several words relevant to Rader’s crimes, such as:
Continue reading