Gutenberg addendum: Small ciphers

The program I wrote for the Project Gutenberg crib experiments also looked for pieces of text that fit into the entire 13-character cipher.


This cipher has many repeated symbols: One symbol repeats three times, and three other symbols repeat twice. Nevertheless, the program found over 2,700 unique bits of text that still fit “as is” into the the cipher text. Here are some interesting examples:

“My name is: afraid it is rat

“My name is: I have here fair

“My name is: in the cement I’m

“My name is: in the desert is

“My name is: of the defeat of

“My name is: of the defect of

“My name is: of the newer tow

“My name is: of these men Tom

“My name is: on the defeat of

“My name is: plenty to the po

“My name is: south the house

“My name is: sweets to the so

“My name is: those were not r

“My name is: twenty to the to

Here are the full results if you want to take a look:

  • constraint-search-13-data.txt: All found plaintext that fit into the cipher text. Record format: Book file name, plain text, zkdecrypto non-unique score, zkdecrypto unique score
  • constraint-search-13-data-uniq-with-counts.txt: All found plaintext that fit into the cipher text. Shown in descending order of how many times the same plaintext was found. Record format: Number of occurrences, plain text, zkdecrypto non-unique score, zkdecrypto unique score
  • constraint-search-13-data-uniq-sorted-by-zkscore.txt: Same as above, but sorted in descending order of zkdecrypto non-unique score. Record format: Number of occurrences, plain text, zkdecrypto non-unique score, zkdecrypto unique score

Note: “zkdecrypto non-unique score” is the traditional zkdecrypto score, which adds up frequencies for all n-grams, including duplicates. “zkdecrypto unique score” is very similar but it avoids counting the same n-grams more than once.

Again, we are faced with the question: Is any of these solutions the correct solution? This remains a mystery, because there are so many possible solutions to choose from, and there’s no objective way to isolate the correct one without compelling corroborating evidence.

One thing about the 13 character cipher that is interesting to me is the fact that it contains so many repeated symbols for such a short cipher. In fact, neither the 408 nor 340 contains as many repeated symbols within only 13 characters. The smallest chunk of the 408 that has as many repeated symbols as the 13 is this 22-character chunk:

This smaller 15-character chunk of the 340, on the other hand, contains as many repeated symbols as the 13:

Compare that to the 32-character “map code” cipher, which has too few repeated symbols for us to ever find a verifiable solution without some other corroborating evidence:

I still have hope that a solution can be found AND verified for the 340 character cipher. Solutions for the 13 and 32 character ciphers are probably beyond our reach, unless some extremely compelling evidence is found.