Big Data Needs Context

I think most people would generally agree with the title for this post, so there might not be much of a debate to be had today, at least not in the way some of my other posts are structured. But I couldn’t pass up the opportunity to share some interesting examples I learned about in a TED Talk I just watched (“What do we do with all this big data?” by Susan Etlinger, September 2014) that make you realize just how important context is in data science.

#1: What the CDC learned about the word “smoking.”

Etlinger walks us through one of the CDC’s studies meant to gain a better understanding about public perceptions to the word “smoking.” To gather this data, the CDC analyzed how “smoking” is referred to on Twitter, a popular social media app with hundreds of millions of monthly active users. They found that “smoking” can mean one of four things:

  • Smoking cigarettes
  • Smoking marijuana
  • Smoking ribs
  • Smoking hot women

These results are not only hilarious, but proved to be incredibly helpful too. You see, the CDC’s entire goal with this study was to understand how they could put out more effective anti-smoking ads, and with the above data, they were able to avoid placing ad placements on audiences who weren’t actually searching about cigarettes. Instead, they could focus their campaigns on people genuinely engaging with tobacco-related content. Pretty interesting, right?

#2: The harm in overvaluing certain metrics in autism cases.

Etlinger also shares a personal story of her son who has autism. He’s non-verbal, which is considered a highly “telling” metric in autism cases. So, one day, when Etlinger found him searching things up on Google, which was his way of compensating for verbal disability in day-to-day life, she had been surprised to learn just how far her son’s creativity and problem-solving skills could carry him. She realized that overvaluing one metric and not giving enough consideration to other metrics can lead to harmful assumptions, and in the case of autism specifically, it can lead to some pretty inaccurate diagnoses.

Etlinger explains just how important it is to avoid confirmation bias by asking ourselves “did the data show us this, or does the result make us feel more successful and more comfortable?” In other words, always look over data with a critical eye, because an added layer of skepticism could help better understand the different contexts in which data might exist.

I know most of us already probably understand the importance of context, but I hope these examples really helped drive home the message: big data may give us scale, but only context gives us sense.

Posted in

Leave a comment