Throw the book at him! [Part 1]
One way to attack a substitution cipher is to guess what part of the plain text solution might be, “plugging” it into where you think it might fit in the cipher text, and then seeing if you can get more of the solution to appear in the rest of the cipher text. Known as a “crib“, this type of attack can be useful if you get lucky and pick the right text.
Donald Harden, along with his wife, solved the 408-character cipher within a week after seeing it. (Image courtesy of zodiackiller.com)
Even with only the small phrase “I LIKE KILLING” used as a crib, other easily solved pieces pop out of the puzzle.
Project Gutenberg is a free digital library that has been around since 1971. They have so far collected over 39,000 books that fall in the public domain. The books are available in plain text format, which makes it easy to come up with computer programs that can process the text.
Can we search those 39,000 books to find cribs that fit into the cryptograms? Certainly. In fact, this is all too easy if the crib is short enough. But short cribs don’t really help us too much, since just about any small word will fit nearly everywhere in the cipher text. But maybe we can get lucky and find a large enough crib that will fit into the cipher text, in an area that is hard to fit cribs into.
Let’s look at an example.
The highlighted section of the solved 408-character cipher above is 19 characters long, and contains some repeated symbols. The repeated symbols make it harder to find plain text solutions that fit into the cipher text.
Above, you can see that the symbol repeats three times (purple highlight). The symbol repeats three times (blue highlight). And the symbol repeats twice (red highlight). So, all of the purple letters have to be the same. All of the blue letters have to be the same. And, both red letters have to the be same.
The known solution to this section of cipher text is “YOU MY NAME BECAUSE YOU.” What other solutions fit there?
A program I wrote searches through all 39,000 books of Project Gutenberg’s collection, looking at about eleven billion characters (which is roughly two billion words). The program prints out any plain text it discovers that can fit into the cipher text in various places. It found about 2,000 pieces of text that fit in the cipher text segment above. But none of them were “YOU MY NAME BECAUSE YOU”, because that phrase did not appear in any of the 39,000 books. Here are some examples of the matches it did find:
Correct solution for comparison:
Other solutions found:
Title: The Valley of the Moon, Author: London, Jack, 1876-1916
That’s right… the loser’s end… twenty dollars. I had some drinks, an’ [treated a couple of the] boys, an’ then there was carfare.
Title: Beacon Lights of History, Volume 09 European Statesmen, Author: Lord, John, 1810-1894
It was the wonder of O’Connell how they could remain cheerful amid such privations and such wrongs, with the government seemingly indifferent, with n[one to pity and few to he]lp.
Title: Odd Numbers Being Further Chronicles of Shorty McCabe, Author: Ford, Sewell, 1868-1946
Well, maybe I oughtn’t to call ’em that, [either. They can’t seem t]o help gettin’ that way, any more’n other folks can dodge havin’ bad dreams, or boils on the neck.
Title: Maid Marian, Author: Peacock, Thomas Love, 1785-1866
and is expounded by the learned doctor Alcofribas, who has [treated at large on the] subject, to signify “drink.”
Title: Jude the Obscure, Author: Hardy, Thomas, 1840-1928
At the moment [that the train came to a] stand-still by the Melchester platform a hand was laid on the door and she beheld Jude.
The bad news is that many pieces of plain text text fit into this part of the cipher text. And none of them were the correct one. But the good news is that we eliminated 99.999982% of the eleven billion possibilities from the entire collection of Project Gutenberg. This fact is important if we managed to get lucky and discover a crib that matched the correct solution!
So, we still have two thousand matches to investigate. How do we know which ones are wrong? Well, when you use a crib, plaintext will appear in the rest of the cipher. A computer program can assign a score based on the quality of plaintext that appears. For example, many letter combinations, such as BK, MZ, PZ, QA, ZG, and ZQ are rare. If too many of them are appearing in the plaintext, then the solution might not be very good. Another approach might be to construct a pattern-based dictionary that permits quick lookups of partial words and word combinations (see Edwin Olson’s paper in which he describes such an approach). Tentative solutions that yield many matches into a well-built pattern-based dictionary could be ranked higher than solutions that only yield a few matches. And you could enhance the dictionary to include an awareness of more common word patterns, by using word-level ngram data. This data is important because we can use it to tell a computer that the phrase “I like killing” is more likely to occur in natural language than “Monkey toothpick police.” By scoring the text, we can rank the tentative solutions and focus our attacks on the ones that look the most promising.
Back to our 408-character cipher: Were ANY phrases from the known solution of the 408-character cipher found among the 39,000 books in Project Gutenberg?
Solution: … better than getting your rocks off with a girl [the best part of it is that w]hen I die I will be reborn in paradice and all…
Found in: Title: The Moving Picture Girls Under the Palms Or Lost in the Wilds of Florida, Author: Hope, Laura Lee
Excerpt: … fire, to be in this lovely, calm place?” “And [the best part of it is that w]e’re getting _paid_ for it!” observed a voice behind…
Solution: … dangeroue anamal of all to kill something gives m[e the most thrilling exper]ence it is even better than getting your rocks off…
Found in: Title: The Plattsburg Manual A Handbook for Military Training, Author: Ellis, Olin Oglesby, 1886-
Excerpt: … CHAPTER IX GENERAL PRINCIPLES OF TARGET PRACTIC[E The most thrilling exper]ience you will have at a training camp will probably…
Solution: … gives me the most thrilling experence it is even [better than getting your] rocks off with a girl the best part of it is that…
Found in: Title: Mauprat, Author: Sand, George, 1804-1876
Excerpt: … for me you would never have thought of anything [better than getting your]self made a Trappist, to ape devotion and afterward…
Title: A Jacobite Exile Being the Adventures of a Young Englishman in the Service of Charles the Twelfth of Sweden, Author: Henty, G. A. (George Alfred), 1832-1902
Excerpt: … and made me see that you are fit for something [better than getting your] throat cut.” The king then changed the subject…
Solution: … fun than killing wild game in the forrest because [man is the most dangerou]e anamal of all to kill something gives me the…
Found in: Title: Chambers’s Edinburgh Journal, No. 435 Volume 17, New Series, May 1, 1852, Author: Various
Excerpt: … of the two now; and of all animals, an enraged [man is the most dangerou]s and the most fearless. I gave him a blow between…
Title: Montlivet, Author: Smith, Alice Prescott
Excerpt: … can do harm enough, but a cowardly, soft-hearted [man is the most dangerou]s of knaves. I might have killed Pemaou when I…
Title: Eminent Victorians, Author: Strachey, Giles Lytton, 1880-1932
Excerpt: … once more he warned Manning to be careful. ‘Dr. New[man is the most dangerou]s man in England, and you will see that he will…
Title: The Mountains of California, Author: Muir, John, 1838-1914
Excerpt: … who chanced to be crossing the range in winter. [Man is the most dangerou]s enemy of all, but even from him our brave…
Solution: I like [killing people because] it is so much fun it is more fun than killing wild…
Found in: Title: Sunlight Patch, Author: Harris, Credo Fitch, 1874-1956
Excerpt: … he growled again. “But are you mad to go about [killing people because] they’re in your way? Don’t you know–” “I know a…
Solution: … will become my slaves I will not give you my name [because you will try to] sloi down or stop my collecting of slaves for my…
Found in: Title: King John of Jingalo The Story of a Monarch in Difficulties, Author: Housman, Laurence, 1865-1959
Excerpt: … no chance of obtaining a majority.” “It is only [because you will try to] do things too fast!” said the King; but the Prime…
This approach might be effective in helping to solve substitution ciphers such as the 408-character cipher. But will it work on the unsolved 340-character cipher? Maybe. But I dare say that I suspect it won’t, since if the unsolved cipher was truly a “normal” substitution cipher like the 408-character cipher was, then it very likely would have been solved by now.
In Part 2, we will continue to take a look at how we can use big collections of books like Project Gutenberg to investigate other aspects of these mysteries.