On January 28, 1986, the Space Shuttle Challenger exploded 73 seconds after liftoff, killing seven crew members and traumatizing a nation.

Millions of viewers (including many schoolchildren) watched the launch live partly because Christa McAuliffe, a social studies teacher who was to be the first civilian in space, was on board. (See a video of the tragic launch here)

Image: The Final Crew of the Space Shuttle Challenger via Wikipedia

The cause of the disaster was traced to an O-ring, a circular gasket that sealed the right rocket booster. This had failed due to the low temperature (31°F / -0.5°C) at launch time – a risk that several engineers noted, but that NASA management dismissed. NASA’s own pre-launch estimates were that there was a 1 in 100,000 chance of shuttle failure for any given launch – and poor statistical reasoning was a key reason the launch went through.

Note. Before and after shuttle explosion (first visible signs of danger on left, just after explosion on right). Via Wikipedia and NASA.

In 1989, Siddhartha Dalal (this author’s father), Edward Fowlkes, and Bruce Hoadley wrote a paper (“Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure”) analyzing the data available before launch to determine if the failure could have been better predicted before launch.

Using standard statistical techniques on previous launch data, they determined that the evidence was overwhelming that launching at 31°F would lead to substantial risk of failure. They measured a ~13% likelihood of O-ring failure at 31°F, compared to NASA’s general shuttle failure estimate of 0.001%, and a 1983 US Air Force study of failure probability at 3-6%. Think about if you would let the shuttle launch if you knew there was a 1 in 8 chance it would fail? 1 in 100,000 chance?

Let’s look at the data that NASA analyzed and see where things went wrong.

Both the post-disaster presidential commission report and Risk Analysis of the Space Shuttle highlighted NASA management’s use of data that showed the number of O-Ring failure incidents versus temperature before launch in tests.

Below is the key graph of the O-ring test data that NASA analyzed before launch. Take a look and see if you can spot any pattern between temperature on the day of the test and O-ring failure rate. If you were the decision maker for launch and only had this graph, would you have allowed the space shuttle to launch at 31°F, the temperature on the day of the launch?

Number of O-Ring incidents vs. Joint Temperature
(Incidents when O-Rings failed)

Source: Report of the Presidential Commission on the Space Shuttle Challenger Accident, 6 June 1986, Volume 1, Page 145, (link) Color added.

NASA management used the data behind this first graph (among many other pieces of information) to justify their view the night before launch that there was no temperature effect on O-ring performance (despite the objections of the most knowledgeable engineers who had run many other experiments). In this graph specifically, it’s hard to find any consistent relationship between temperature and failure rate in the provided data.

But NASA management made one catastrophic mistake: this was not that chart they should have been looking at.

NASA looked at the times when the O-rings failed, but excluded looking at the times when the O-rings were successful. If there were many successful launches at a certain temperature range but none in another range, they’d quickly show the danger.

Look at a graph of the full data set (this time including successes, rather than just the failures). Do you now see any pattern between temperature and failure rate? Would you still allow the space shuttle to launch at 31°F if this was the only information you had?

Number of O-Ring incidents vs. Joint Temperature
(failures AND successes)

Source: Report of the Presidential Commission on the Space Shuttle Challenger Accident, 6 June 1986, Volume 1, Page 145, (link) Color added.

Successful launches (those with no failure incidents) had not been listed in the first data set we saw, and if included would have led most to conclude there was a definite temperature effect.

Of the many launches at high temperature (>65°F) in the second graph, a smaller percentage had O-ring problems (15%). In the very few launches at low temperatures, 100% had O-ring problems (and these were only tested between 50°F and 65°F, not the even colder 31°F at launch). If the data behind this second graph had been used by NASA management, it’s more likely that the launch would have been postponed (though some think, even this wouldn’t have been enough).

This comparison highlights just how important having a full set of data is – and how it might literally mean the difference between life and death. You can’t just look at the problem events, you also have to look at the absence of problem events.

We might visualize the difference between the two graphs as follows, with the data leading to inferences that impact the decision about whether the space shuttle should have launched:

***

I drew a key lesson from the Challenger disaster: looking at selective data that excludes critical information leads to flawed decision-making.

As humans, when we seek to make sense of a large number of data points, we construct a limited data set  – we observe or include only a subset of all data points through a process known as sampling. This can lead to a selective, biased data sample if this misrepresents the underlying data. In the Challenger’s case, it was selectivity for failures alone that led to a flawed graph and a subsequent flawed decision.

This selectivity is not confined to NASA, but applies from everything from assembly line defects to academic research.

My particular interest is how selective data has been particularly troublesome in media, especially with the growth of social media. In the next post in this series, we’ll look at the challenges of selective media coverage, seeing how it leads us to bad inferences and distorts the decisions we then make.

Selectively analyzing data like in the case of the Challenger is a pervasive problem, but it’s particularly common in media. For example, the media reports when bad things happen (like a rare terrorist attack) but doesn’t report when it doesn’t happen because the absence of an event isn’t newsworthy. As media consumers, however, we use this coverage to form opinions and make life decisions (like voting, determining personal safety, convincing our friends of certain beliefs), without realizing how selectiveâ€‹ this coverage is; this distorts how we view the world and leads us to make flawed decisions.

Over this series, we’ll look at a substantial number of visualizations to see the inferences we would make with selective media coverage – and compare that to the inferences we’d make with the entire dataset, just like we did with the Space Shuttle Challenger.

***

This was a post by Nemil Dalal and is Part 1 in a series where he looks at data to identify media selectivity and see its effect on decisions. To get notified when the next post in this series is published, sign up for Nemil’s email list.