Does More Data Really Show Us More?

I think a lot of people would agree with me when I say that “big data allows us to see more.” But what does “more” even mean here? My problem with this phrasing is that it makes it sound like more data literally expands the “image” of insights for us. And it’s not that this isn’t the case…it’s just that this phrasing oversimplifies what data really does for us. More data might expand the image, but what it’s much more likely to do is show us more within the image we already had, sometimes even revealing parts of the image we didn’t realize were there.

Let’s pause. That might have been a bit confusing, so ‘ll explain this again with a visual analogy. Take a look at the following picture (and be kind, I drew this myself…I’m a data scientist in the works, not an artist).

Let’s call this our baseline, representing what we can “see” with a certain starting amount of data. We can make some assumptions from this amount of data:

  • This seems to be in nature somewhere.
  • The weather seems to be pretty good.
  • There are a lot of what seem to be trees (I use the word “seem” here, not because I consider my drawing to be so bad that it’s un-identifiable, but because in data science, we can never be 100% sure of an insight or prediction.)
  • There doesn’t seem to be much else going on.

Now what if we added more data? How would that change what we see? A lot of people think we’d see the following…

And here is where my entire point lies. If you’re under the assumption that more data shows us more in scale (like in the picture above: more trees, more sky, more grass), then you’re missing the beauty of data entirely. Here’s what it would really look like if we added more data…

Did you get what I’m trying to say? Adding data didn’t expand the size of our picture, but it did offer significantly more insights into our original picture. As compared to what we started with (just trees, grass, and sky), we can now make all sorts of assumptions that we couldn’t make before:

  • There seems to be a park specifically (instead of what we could only speculate to be “nature” on our previously limited amount of data).
  • The weather seems to be cloudy specifically (instead of what we could only speculate to be “good” weather on our our previously limited amount of data).
  • The park seems to have a slide and a swing.
  • The park seems to be covered in mulch, with wooden boundaries.
  • There seems to be two children at the park.
  • There seems to be a boy going down the slide.
  • There seems to be a girl on the swings.
  • There seems to be an owl in the hollow of one of the trees.
  • There seems to be a man with his dog (…training? playing fetch? going for a walk?)
  • There seems to be some other plant life (…ferns? bushes? tall grass?)

Essentially, more data doesn’t just offer additions or extensions to original insights, instead it opens up entirely new insights. So to answer the question posed in this post’s title, does more data really show us “more,” the answer is: yes, absolutely. More data does show us more, but you have to be careful with what “more” means.

Posted in

Leave a comment