Advertisement

Article

When a data point is supported mostly by anecdotes

Engineers are surrounded by data, and that's probably a good thing. Using this data, we can look for patterns, trends, bugs, and performance attributes. Sometimes we have too much data, including some that is in conflict with itself, but that's part of the puzzle that a good engineer can spot and hopefully unravel.

I wish the mainstream press would pay some attention to data, but I fear it's not going to happen. A reporting trend which I have noticed getting much worse over the last few years is the use of a factoid plus anecdotes, purporting to provide some serious insight into a major development or trend. These so-called insight stories lean heavily on a single piece of data (or perhaps two), often taken out of context, to present some hypothesis and then buttress it with all sorts of anecdotal evidence which consists mostly of self-serving quotes from parties who have a vested interest in the outcome.

When you rinse away all the unsupported and unsupportable statements, you are left with a tiny little grain. It's like chemistry class in high school, where you would have a container of some semi-known mixture, then rinse it through with the right reagent, and if you were lucky, you'd have a speck of the desired precipitate at the bottom. Or maybe it was just an impurity?

The most glaring example I have seen lately of a statistics pretending to contain great truth was an article in the New York Times recently (January 16, 2007, “51% of Women Are Now Living Without Spouse”). This article, with its eye-catching headline, went on for paragraphs and paragraphs about why this might be so, the implications to society, and more. It also got lots of page views and pickup by other media outlets, which means it was a good story almost by definition.

The only problem was that the key statistic was quite flawed. Buried in the article was the dirty little secret. The data was based on data from the census (probably as good as you're going to get) but included women 15 years and older. Excuse me? It seems that including the marital status of women under 18 or 21 years old would seriously skew the data results, and render the trumpeted conclusion invalid.

But never mind. The story was “supported” by all sorts of local anecdotes, as if interviewing ten people at random (whatever “random” means in this context) substantiates the alleged fact. It's similar to when marketing folks extrapolate from a single data point, or use a baseline statistic that is so small that it results in an almost meaningless trend line. But, hey, if that's what it takes to get attention, make a point, or get investors, that's OK!

The New York Times story was debunked within a few days, by readers and other population experts who saw the obvious flaw. Even the Times' Public Editor (aka ombudsman) agreed that readers were poorly served by this sloppy use of data (not surprisingly, the article's authors felt it was a reasonable story, regardless of the facts).

What's this have to do with engineers? It gets back to the basics. When you are investigating a technical situation, of course you'll collect data and facts. And despite the pressure to move forward, more of these is usually better, if only to allow some cross-checking. Then assess the circumstances of the collected information, to see if there of their frame of reference is perhaps misleading. System results at 25 degrees C, for example, may not reveal what you need to know at lower or higher temperatures, and standard adjustments for temperature may be insufficient.

Don't let the danger of data overload drive you in the other direction, towards skimping on data. People who study the scientific process know that one pattern repeats itself fairly consistently: we see what we expect to see or want to see. Having more data can provide you with some unexpected variations and anomalies that you'll need to check out more carefully, or at least make you stop and think what they mean.

0 comments on “When a data point is supported mostly by anecdotes

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.