Celebrity deaths: A statistical analysis
Twice before I’ve written about the “celebrity deaths come in threes” superstition, in 2008 and 2009.
With the recent passing of Art Linkletter, Gary Coleman, and Dennis Hopper, this superstition has again resurfaced.
I feel my previous arguments have already been quite persuasive, but now let’s add a statistical debunking.
To analyze the superstition, we need to define it. That includes two tasks:
- Who is and isn’t a celebrity
- The timeframe for the deaths to occur
I extracted the data of all 1,422 celebrity deaths that have occurred between January 1, 1995 and May 31, 2010 from a site called stiffs.com, which is the location of a death pool contest. (The contest has entrants predict which celebrities will die in the upcoming year and assigns points for correct guesses. Last year’s winner took away over $3,000.)
This addresses the first question, who’s a celebrity. At stiffs.com they have a panel of judges determine whether or not a person who passes away is famous, based on simply whether or not five or more members of the panel have heard of the person. They then create a list of celebrities ahead of time, and then monitor that list to see who has passed away.
You may well disagree with the fame assessments of stiffs.com. Certainly the data included plenty of people I personally had never heard of. But it’s a list that exists independently of the superstition, and is pre-existing, so it doesn’t suffer from the selection bias that arises when you assess whether or not a person is famous only after they have died.
As for the timeline, I decided to analyze it with as much leeway as possible. One day between each death? Up to two days? Three? Five? Seven? Who knows. I analyzed with a number of tolerance days all the way up to 10.
Before we get into the numerical analysis, let’s visualize the data.

(Click to enlarge; depending on your browser, you may need to click again to view at 100% and then scroll from left to right)
As you scroll back and forth in the listing of deaths from 2004 through today, your mind can certainly pick out groups of three. But is it ALL groups of three? Is it even MOSTLY groups of three? Your eyes already tell you the truth, that of course it’s not.
The numbers back up that visual refutation.
There are quite a few ways to analyze the data, and I tried to be comprehensive. Here are the approaches I took:
- Rolling timeline: This is probably the best method. (It was suggested by Patri.) When a death occurs, I start a counter. The counter lasts up to x days. (I analyzed with x from 1 to 10.) I keep track of how many celebrity deaths occur within that period. The counter resets after x days, and starts again whenever the next death occurs. With x at 7, for example, it’s basically an analysis of how many deaths a week, using rolling weeks.
- Continuous grouping: When a death occurs, start a count. Look at the next death. Is it within x days? If so, increment the counter. If not, start over at 1. Again, I analyzed with x ranged from 1 to 10.
- Separate tests: For each death, I calculate if it’s part of a group by looking at the date of death of the first member of the group, and see if it’s within x days of the last death. For the first death, it should be more than x days. For the subsequent deaths, it should be within x days. I then judge “pass” or “fail” for each death. I applied this analysis to groups of 1, groups of 2, groups of 3, groups of 4, groups of 5, and groups of 6. I also let it “roll” by varying where I started the counter. This analysis also looked at x ranging from 1 to 10.
So, what are the results?
For rolling timeline, we see the following results:
| Tolerance Days (x) | Groups of 1 | Groups of 2 | Groups of 3 | Groups of 4 or more |
|---|---|---|---|---|
| 1 | 75.7% | 19.0% | 4.6% | 0.6% |
| 2 | 47.6% | 35.3% | 12.4% | 4.6% |
| 3 | 28.3% | 40.8% | 20.0% | 10.8% |
| 4 | 18.8% | 39.8% | 23.6% | 17.8% |
| 5 | 12.7% | 31.9% | 23.8% | 31.6% |
| 6 | 9.7% | 26.0% | 24.7% | 39.6% |
| 7 | 7.5% | 22.9% | 25.1% | 44.5% |
| 8 | 6.0% | 18.0% | 22.6% | 53.4% |
| 9 | 4.7% | 14.9% | 20.5% | 59.9% |
| 10 | 3.7% | 12.1% | 19.4% | 64.8% |
No matter how many days of leeway you give, groups of three never actually best explain the data. If you give a lot of leeway, such as 10 days, larger groups occur. If you give only a little leeway, most deaths happen alone or in pairs.
The best performance for groups of three is when you allow a leeway of 7 days, but even then the superstition fits for just 25% of the deaths. (Groups of two deaths are not far behind, at 23%.) A superstition that’s only right one time out of four — and does no better than several of variants of the superstition — well, that’s not a useful superstition.
So, for this methodology, groups of three never really succeeds. With 7 or more days of leeway, three is the average and median for groups of deaths, but only with a 23% success rate. No interpretation of this data with this method would lead one to agree that celebrity deaths come in threes.
For the second method, continuous grouping, the results are similar. You can get some pretty big groups with this method — using three tolerance days, the largest group turns out to be a group of 21 celebrity deaths. And with 10 tolerance days, the largest group is of 243 deaths!
However, no matter how many tolerance days you allow, groups of three never amount to more than 14.3% of all groups. So at best, groups of three explains about 1 death in 7 with this method.
The third method I used was separate tests. To be honest, this is a pretty stupid method, since if, say, two deaths in a group of three fit the pattern but one doesn’t, it still scores as two out of three when really the entire group should fail. And the groups are highly dependent on previous groups, so if there’s a missing celebrity or a person included who isn’t really a celebrity, it throws off the entire test.
Under this method, groups of three still score very poorly. No matter how many tolerance days you allow, from 1 to 10, it always turns out that some other grouping (such as groups of 2 or groups of 6) beat out groups of 3. Groups of 3 performed best with 10 days of tolerance, but with that high a tolerance, groups of 4, 5, or 6 fit even better. At most, 64% of celebrity deaths would pass a group of three test but at the same time 72% fit a group of 4.
The data, analysis, and chart are all available for you to examine (Google docs share, 6.6 megs, Excel format).
If you asked me, the best method is the rolling timeline method, and the most reasonable number of days of tolerance is three. Going with that, we find that, on average, the group size is 1.7.
But “Celebrity deaths come in 1.7s” doesn’t have a winning ring to it.
June 2nd, 2010 at 9:25 am
I think there’s one more way to analyze the data:
What is the average number of days for a span of any 3 deaths?
The algorithm would be:
1. Start with the oldest death, D.
2. Move forward N deaths (N=2 to get to the 3rd death) to death D+2
3. Record the number of days between deaths D and D+2
4. Move forward to death D+1 and continue the algorithm at step 2.
Then, you could compute the mean, median, standard distribution, etc. for the number of days between 3 deaths. You could even use the result of this computation as an input to your other analysis.
I’m happy to code this up if you can provide a CSV or other easily parseable version of the data.
June 2nd, 2010 at 9:50 am
Looking at your analysis, I find it unusual that the groups of 3 percentages, while never really being the most common, do remain strangely consistent across the 1-10 days leeway period.
The groups of 1 and 2 fall as you increase the number of days (as you’d expect) and the groups of 4+ rise along the same lines. But the groups of 3 number stays within 20-25%, for the most part.
Perhaps this gives rise to the thinking, as it’s most noticeable despite how much leeway is given in the minds of the people thinking it. Noticing it happen 25% of the time is plenty enough for it to cross the “coincidence threshold” for most people.
June 2nd, 2010 at 1:19 pm
(Steve, see below, comment responded to on FriendFeed.)
Otto, I think you’re right that in general (using the rolling method, method 1) with a wide range for x, groups of 3 performs slightly less miserably than other groupings, usually at best 25%.
I suspect there’s a selective perception going on (where people remember the groups of 3 but forget the other times when there isn’t), and you’re right that 25% is high enough to reinforce the belief, if you already hold it.
June 1st, 2010 at 8:16 pm
A statistical refutation of http://friendfeed.com/davisfreeberg/e0e5f0f6/first-gary-coleman-now-dennis-hopper-who-will-be . The image above can be viewed full-size at http://www.zeigen.com/blog/wp-content/death-timeline-2004-01-to-2010-06.gif . My conclusion? Celebrity deaths come in one-point-sevens, not threes.
This comment was originally posted on FriendFeed
June 1st, 2010 at 1:17 pm
Awesome
This comment was originally posted on FriendFeed
June 1st, 2010 at 1:22 pm
Thanks, Chieze! I enjoyed crunching actual data.
This comment was originally posted on FriendFeed
June 1st, 2010 at 1:24 pm
Celebrities die / But do their deaths come in threes? / No. One point sevens. http://www.zeigen.com/blog/2010/06/celebrity-deaths-stats/
This comment was originally posted on Twitter
June 1st, 2010 at 9:33 pm
Debunking the “Celebrities die in threes” myth: http://bit.ly/93o5t2
This comment was originally posted on Twitter
June 1st, 2010 at 9:33 pm
Debunking the “Celebrities die in threes” myth: http://bit.ly/93o5t2
This comment was originally posted on Twitter
June 1st, 2010 at 2:37 pm
This is why the Internet is great — debunking the myth that celebrities die in threes. http://bit.ly/93o5t2 (via @stiffsdotcom)
This comment was originally posted on Twitter
June 2nd, 2010 at 2:13 am
No refutations? Have I really killed it this time?
This comment was originally posted on FriendFeed
June 2nd, 2010 at 9:45 am
I’m on a plane!
This comment was originally posted on FriendFeed
June 2nd, 2010 at 2:51 am
See my blog comment — From some perspective, deaths *always* come in 3’s, you just have to see what the distribution of durations between deaths D and D+2 are.
I’d like to know: What’s the median (or 90, 95%ile) time for which "all deaths come in 3’s" is true.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 10:02 am
Nice analysis, tough to argue with the math. Perhaps we are looking at this from the wrong perspective. According to the World Health Organization 1.2 million people die per year. That would make 400,000 sets of 3 with the celebs getting the attention when it happens to them for their 25%.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 3:03 am
BTW, would also be nice to compare vs. a random control sample of a similar distribution to actual celebrity death data.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 10:15 am
The spreadsheet is pretty slick, there is all kinds of interesting data.. There must have been some kind of cosmic event in July of 99 because 21 celebs kicked the bucket on that month.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 10:28 am
Steve, I’ve seen your blog comment now, and you’ve proposed an interesting approach. When I get some time I’ll post the CSV data and run the analysis you suggest. However, you can tell at a glance from just the visual representation that the duration between D, D+1, and D+2 will vary wildly.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 10:29 am
Davis, how do you propose testing for the global population? What groupings of the deaths do you think exist? Geography?
This comment was originally posted on FriendFeed
June 2nd, 2010 at 10:30 am
Also, Davis, you’re right about July 99. You’ve got me thinking of another simple analysis: Number of deaths per month, and what percentage of the time that number is evenly divisible by three.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 3:30 am
Steve, for the control data, what do you suggest? National obituaries? Death records for all inhabitants of a particular county?
This comment was originally posted on FriendFeed
June 2nd, 2010 at 10:33 am
No, just a random distribution. You know how many deaths occurred over your interval, so just simulate it assuming each person has a 1/Nth chance of dying each day (where N is the number of days in the sample).
This comment was originally posted on FriendFeed
June 2nd, 2010 at 10:34 am
And yup, I know the distribution will be pretty wide, and that’s what’s interesting about it.
You should be able to compare that with the random distribution to see if there’s any difference.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 3:45 am
Man, this conversation reminds me of the stuff I liked about my stats class…
This comment was originally posted on FriendFeed
June 2nd, 2010 at 3:52 am
Testing the global population would be tough, but that’s the beauty of this argument. It goes back to your point about never being able to disprove it. I was thinking about running the monthly numbers too, but after a casual glance I’m thinking the average will come in around 5.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 1:11 pm
Steve, I took a look at the suggestion of days between D and D+x as you suggested in more detail in your blog comment. For every metric (average days, median, etc.) the data is better explained by groups of 2 than groups of 3. Performance for all groupings was pathetic, with an average of at least 8 days overall even for groups of 2. The standard deviation was at least 4.5 days, which backs up my previous prediction that it would vary wildly.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 1:12 pm
Davis, I took a look at number of deaths per month as well. The number of months were the total deaths was divisible by three was… wait for it … 38.2%, close to expected value of 33.3%.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 1:12 pm
Davis, I took a look at number of deaths per month as well. The number of months where the total deaths was divisible by three was… wait for it … 38.2%, close to expected value of 33.3%. Average number of deaths per month is actually 7.7.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 1:12 pm
A revised version of the spreadsheet with new tabs ("Avg Days of Span" and "Calendar" respectively) for the two new methods is now available, http://docs.google.com/fileview?id=0BwaJ24_ZGjhGMDMyODljNjYtNjI1Ni00MWZjLThhYTQtMzQwMjIxNjAxNjFj&hl=en
This comment was originally posted on FriendFeed
June 2nd, 2010 at 1:13 pm
I’ve also provided a data file called CORPUS.TXT with just the names and dates, in CSV format: http://docs.google.com/leaf?id=0BwaJ24_ZGjhGYWI5MjMwNzItMjViYi00YjRiLTliMzktYjg3YTc2OGM4ZDA5&hl=en
This comment was originally posted on FriendFeed
June 2nd, 2010 at 6:49 am
If I’m reading that right, the ~50%ile of the number of days between D and D+2 is 7. To me, that says "half the time, 3 (or more) celebrities have died in the same week" That seems like the kind of thing that would easily turn into an urban myth. If you go out to 14 days, you get to nearly 87%!
This comment was originally posted on FriendFeed
June 2nd, 2010 at 8:08 am
Steve, yes. But 50% success is awful. Another way of saying that is, "half the time, when a celebrity dies, only 1 or 2 celebrities will die that week." So the myth at BEST is 50% right, when you allow a FULL WEEK for the deaths to coincide. You’d do better if the myth is that "celebrity deaths come in 2s" — about half the time, 2 celebrities die in a period just 2 days apart.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 3:14 pm
Put another way: Using your method, "celebrity deaths come in twos" is more accurate than "celebrity deaths come in threes." So the superstition is wrong.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 3:16 pm
For an urban myth, I think 50% success rate is pretty good.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 8:29 am
It’s not 50%. To get 50%, you have to include 3 OR MORE (so, sometimes 4, sometimes 5, sometimes 10) — and that’s with AN ENTIRE WEEK of allowance. Remember, other variations of the myth (celebrities die in 2s, celebrities die in 1.7s) do better . You’ve seen the spreadsheet. There’s no method where "celebrities die in 3s" has any statistical validity whatsoever. This myth has been 100% debunked.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 8:44 am
The idea of 70% dead celebrity is kind of gruesome
(He’s only mostly dead….)
This comment was originally posted on FriendFeed
June 2nd, 2010 at 8:55 am
He got better.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 4:13 pm
Send this data to Mythbusters!
This comment was originally posted on FriendFeed
June 2nd, 2010 at 9:15 am
Unless the Mythbusters team gets to actually blow up the celebrities to test whether they clump together in threes or not, I don’t think this myth is quite telegenic enough for them.
This comment was originally posted on FriendFeed
June 2nd, 2010 at 4:43 pm
Slacy created a graph in a separate thread here: http://friendfeed.com/slacy/443aaaff/some-further-graphs-for-do-celebrities-die-in-3
This comment was originally posted on FriendFeed
June 2nd, 2010 at 9:44 am
We’ll put it to the internet and get a definitive answer. Today’s Ask500 question is do celebrities die in 3’s? http://www.ask500people.com/questions/do-celebrities-die-in-groups-of-three
This comment was originally posted on FriendFeed
June 2nd, 2010 at 10:18 am
50/50 after 4 votes. Please go and vote your conscience!
This comment was originally posted on FriendFeed
June 3rd, 2010 at 3:35 am
I think what’s missing is a solid theory of celebritigenesis. Without that, it’s hard to figure out what the null hypothesis would be.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 4:09 am
Rue McClannihan makes three!:-P Actually Stephen, I’ve got some interesting analysis to dovetail with your own that I hope to write up in the next few days.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:18 pm
Victor, I think stiffs.com’s approach is good. A celebrity is hard to define, but (like many other things) we know one when we see one. Having an objective panel that independently answers "have you heard of this person?" is about as good a criteria as I can think of.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:19 pm
Kevin, I thought it was supposed to be Art/Gary/Dennis and now Rue starting a new series? This superstition is so hard to nail down! But regardless, I look forward to your analysis.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:27 pm
Stephen, I guess that’s just it–where exactly is the threshold for celebrity vs non-celebrity? Are you famous if 1 million people know you, but not if only 999,999 people know you? I guess my question is, what do we actually think we’re measuring?
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:27 pm
Stephen, I guess that’s just it–where exactly is the threshold for celebrity vs non-celebrity? Are you famous if 1 million know you, but not if only 999,999 people know you? I guess my question is, what do we actually think we’re measuring?
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:32 pm
Art isn’t famous to me, so it’d be Gary, Dennis and Rue.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:32 pm
Victor, it’s an excellent question. If you want to create your own personal list of living celebrities, we can then monitor that list as they pass away and test whether or not they die in threes (whatever THAT means), and then judge whether or not the superstition is true for YOU.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:32 pm
Kevin, funny, I’d never heard of Rue before today. (Never watched Golden Girls.) It proves Victor’s point. The subjective nature of the superstition is what makes it hard to prove or falsify.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:38 pm
The subjective nature makes it much easier to prove, just harder to tell whether it’s meaningful. If, in any string of 10 deaths, there are clumps of three that are meaningful for one person that person will claim it to be an instance of the ‘death in 3s’ phenomenon, and for them it will be right. They’ll have heard of several other instances of ‘death in 3s’ from their friends over the years, which will have been equally valid for those friends. For each individual person a death triplet may not come along more often than statistics would predict, but when we hear our friends telling us about it when it happens it makes it seem more frequent. It’s like when your friend has a winning lottery ticket and tells you about it, you get the idea that winning has a higher likelihood than it actually does because of the positive communicative bias. Similarly, I can say this last week has been a ‘celebrity death sequence of three’ instead of four, because I didn’t know who Art was. To me it’s true and I tell my friends about it and…
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:43 pm
Now that I think of it, the null hypothesis would be thus: if celebrities *don’t* die in clumps of three, then there should be roughly equal spacing between their deaths (+/- some error), regardless of what time frame you choose. So all you have to do is pick a threshold duration, wait for celebrities to die, and see if there’s a statistically significant difference between the actual death rate versus if they just died at regular intervals instead.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:43 pm
Now that I think of it, the null hypothesis would be thus: if celebrities *don’t* die in clumps of three, then there should be roughly equal spacing between their deaths (+/- some error), regardless of what time frame you choose. So all you have to do is pick a threshold duration, wait for celebrities die, and see if there’s a statistically significant difference between the actual death rate versus if they just died at regular intervals instead.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:47 pm
So say we pick a duration of a week. If the null hypothesis is correct, then they should each die roughly within 2 days and 8 hours of each other. If it’s tighter than that (we’d have to do the math to see at what point it would be statistically significant) then the alternative hypothesis is true, and they *do* die in clumps.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:49 pm
Kevin, exactly right. My first debunking (from 2008) used the same argument: "This belief is an example of selective perception (http://en.wikipedia.org/wiki/Selective_perception); You tend to remember the times when there was a grouping of three seemingly-related deaths in a short period of time, which reinforces the belief, but tend to forget the times when there wasn’t a pattern.As I’ve said before, our brains are remarkably good similarity detectors (whereas computers are excellent difference detectors; this is why captchas work), so we often find ourselves picking out plausible-seeming patterns to events that ultimately are chaotic and unpredictable."
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:51 pm
Another psychological factor to consider is mental temporal boundaries. If the weekend is a memory reset, where things on the far end of a weekend feel more removed, then three deaths in a work week, or in a single weekend, could be perceived as a clump even if the boundary between the first and the one previous to it is less than the boundary between the first and the third, just because the one before the work week ‘feels significantly longer ago’
This comment was originally posted on FriendFeed
June 3rd, 2010 at 5:57 am
Victor, the raw data is available. (See links above.) There are 1,422 deaths over 5,669 days. So you expect the average death to be about 4 days apart, which in fact is what the data shows. The null hypothesis you propose in fact has strong significance in the data.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:57 pm
Victor, the raw data is available. (See links above.) There are 1,422 deaths over 5,600 days. So you expect the average death to be about 4 days apart, which in fact is what the data shows.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:59 pm
So there you have it. Proof that celebrities don’t die in clumps but actually die at regular intervals.
This comment was originally posted on FriendFeed
June 3rd, 2010 at 12:59 pm
Thank you, sir! *shakes hands with Victor*
This comment was originally posted on FriendFeed