# Victim name patterns

“EPotts” on Tom’s forum posted this intriguing observation about the names of the Zodiac Killer’s victims:

May be nothing but I noticed names like FErrin-EDwards-STein-BAtes-yet only 2 out of the top 100 most common surnames have letters in alphabetical sequence like this. Maybe just a coincidence? I think it could be but thought I would mention it. sorry if this has been posted b4

Confirmed victims Darlene ** Fe**rrin and Paul

**ein have last names that begin with two letters that are adjacent in the alphabet. Unconfirmed victims Linda**

__St__**wards and Cheri Jo**

__Ed__**tes do also.**

__Ba__Let’s look at all the victim names. There are reportedly seven confirmed victims: David Arthur Faraday, Betty Lou Jensen, Michael Renault Mageau, Darlene Elizabeth Ferrin, Bryan Calvin Hartnell, and Paul Lee Stine. There are at least four unconfirmed victims: Robert Domingos, Linda Edwards, Cheri Jo Bates, and Donna Lass (five if you include Kathleen Johns). People often debate over these lists but let’s go with the ten names for now.

As it turns out, if you pick ten last names completely at random, there’s about a **1.3% chance** that at least four of them will have this interesting pattern. (If you want to torture yourself by seeing the math behind this, click here.)

Seems very rare, doesn’t it? Did the killer select his victims because of this pattern in their last names?

Well, maybe. But we have to be careful here. We know that the chances of finding *this particular interesting pattern* are low. But we didn’t consider *other interesting patterns*.

For instance, what if 4 out of the 10 names start with the same letter? Or if they end with letters that are adjacent in the alphabet? Or if each name starts and ends with the same letter? If we were looking at another list of names, and it had one of these other kinds of patterns, would we think it was significant?

For that reason, we need to change the question from:

- What are the odds
*this*interesting pattern happens completely by chance?

to:

- What are the odds
*any*interesting pattern happens completely by chance?

To demonstrate this, click here to run an experiment that shows how patterns can be found in random names. When you click the button there, it will select 10 names at random from the 30,000 most common names in this US census data. The random selections are done 1,000 times. The experiment then displays any sets of names that matched at least four patterns.

*Screenshot of experiment page*

By default, it only looks for the kind of pattern EPotts discovered. Now, let’s see what happens when we include more kinds of patterns. Click more of the checkboxes to select more patterns, then click the button again. You’ll see that more matches are displayed. In fact, if you select every checkbox, then around **40%** of the random trials will result in interesting patterns.

With some creativity, we could think of even more kinds of patterns that would stand out in a list of names. For example, what if the first letters of each name anagrammed to another name or interesting word? If such a pattern is discovered in a list of names, surely it would seem too unlikely to occur by random chance. But when all such patterns are considered, the odds are quite high that we’ll find *something* interesting.

If the Zodiac killer *really did* select victims whose names formed a pattern, we’d need some stronger evidence to show that it was intentional. What if *all* of the victim names followed EPotts’ pattern? If they did, the odds of it happening by chance drop from **1.3%** to **0.0000000001%**, which is very compelling. But they don’t, so we’d need more proof, such as some other direct reference to the scheme, perhaps among Zodiac’s correspondences.

__WARNING: Here comes the boring math part.__

As mentioned earlier, if you pick ten last names completely at random, there’s about a **1.3% chance** that at least four of them will have this interesting pattern. How can we figure this out directly, instead of running an experiment to estimate the odds?

If you look at the census data, about 1 of every 10 names have EPott’s pattern. So we can say that there’s a 10% chance than any name will have it.

When we pick random names, we are filling 10 empty spots:

After filling the slots, let’s say we found 4 names that have the pattern, and 6 that don’t. We show names with the pattern as green, and the rest as red:

What is the probability of this exact arrangement of slots happening? Take a moment and think of a six-sided die. The probability of rolling a 2 is 1/6. The probability of rolling a 4 is 1/6. What is the probability of first rolling a 2, and then rolling a 4? Answer: 1/6 * 1/6 = 1/36.

Similarly, the probability of a green slot is 1 in 10. The probability of a red slot is 9 in 10. To figure out the probability for all slots, we multply the probabilities: (1/10) * (1/10) * (1/10) * (1/10) * (9/10) * (9/10) * (9/10) * (9/10) * (9/10) * (9/10) = **0.0000531441**.

That tiny number is the probability of that one event happening. But there are more ways that four green slots can appear. They can be in different positions:

And there could be more than four green slots:

Again, think about the six-sided die. What is the probability of rolling a 3? Answer: 1/6. What is the probability of rolling a 3 OR rolling a 5? Answer: 1/6 + 1/6 = 1/3.

Similarly, we must add the probabilities for each of the possible events where at least 4 green squares appear. How many arrangements of the slots have at least 4 green squares? We have to use combinatorics to get the answer.

The number of ways to choose 4 slots from 10 is “10 choose 4”, or C(10,4), which is **210**.

So, we have to add that tiny probability 210 times. Thus the probability that selecting 10 random names will result in exactly 4 green slots and 6 red slots is: 210 * 0.0000531441 = **0.011160261** (or **1.12%**).

But we aren’t quite done yet. We also have to include situations where there are exactly 5 green slots, or 6 green slots, or 7 green slots, etc.

The probability that one particular event has:

- Exactly 4 green slots and 6 red slots: (1/10)
^{4}* (9/10)^{6}=**0.0000531441** - Exactly 5 green slots and 5 red slots: (1/10)
^{5}* (9/10)^{5}=**5.9049 × 10**^{-6} - Exactly 6 green slots and 4 red slots: (1/10)
^{6}* (9/10)^{4}=**6.561 × 10**^{-7} - Exactly 7 green slots and 3 red slots: (1/10)
^{7}* (9/10)^{3}=**7.29 × 10**^{-8} - Exactly 8 green slots and 2 red slots: (1/10)
^{8}* (9/10)^{2}=**8.1 × 10**^{-9} - Exactly 9 green slots and 1 red slots: (1/10)
^{9}* (9/10)^{1}=**9 × 10**^{-10} - Exactly 10 green slots and 0 red slots: (1/10)
^{10}* (9/10)^{0}=**1 × 10**^{-10}

Now we have to count all the ways for those events to happen:

- Ways for exactly 4 green slots and 6 red slots to appear: C(10, 4) =
**210** - Ways for exactly 5 green slots and 5 red slots to appear: C(10, 5) =
**252** - Ways for exactly 6 green slots and 4 red slots to appear: C(10, 6) =
**210** - Ways for exactly 7 green slots and 3 red slots to appear: C(10, 7) =
**120** - Ways for exactly 8 green slots and 2 red slots to appear: C(10, 8) =
**45** - Ways for exactly 9 green slots and 1 red slots to appear: C(10, 9) =
**10** - Ways for exactly 10 green slots and 0 red slots to appear: C(10, 10) =
**1**

Then, to get the total probability that at least 4 green slots will appear among 10 random selections, we multiply the counts by the corresponding probabilities, and then add them all up:

Total probability =

210 * 0.0000531441 +

252 * 5.9049 × 10^{-6} +

210 * 6.561 × 10^{-7} +

120 * 7.29 × 10^{-8} +

45 * 8.1 × 10^{-9} +

10 * 9 × 10^{-10} +

1 * 1 × 10^{-10} = **0.0128** (or **1.28%**).

My apologies if these explanations are as cryptic as the ciphers!