Big Data: Unstructured vs. Structured Data

Part 1 in a Series on Big Data

2 minute read

What is big data? Why is big data here? What does big data do? Replace all of instances of “big data” with “the cloud” and you’ll see the recurring phenomenon of new technology entering the limelight despite most people understanding what is it. Usually technology is quite easy to understand: POS scanners track sales, Excel analyzes data, and iPods play music. So why is big data so hard?

Big data is hard to understand because it really isn’t anything, per se. Big data, quite literally, is just storing large amounts of data. Lots of it. Of all different shapes and sizes. And it grows. Everyday. As storage costs go down (and much of it moves to…the cloud), businesses are more apt to store stuff. Stupid, dumb stuff like tweets, customer reviews, videos and more. Why is the data dumb? It’s probably not the best characterization, but in order to explore “dumb data”, let’s talk unstructured versus structured data.

Structured Data:

If you can put it into excel, it’s probably structured. Think of a typical sales spreadsheet for your favorite candy store. It shows them each candy, how much was sold, at what price and so on. It has SKUs, prices, quantities and dates, all nicely formatted. For marketing professionals, you’d use SCAN*PRO on it. For data nerds, you’d pull it out of a SQL database with joins and where clauses. For the sales team, you’d take it out to dinner. (I’m assuming, I’m not in sales).

Unstructured Data:

Whatever your crazy uncle shares on Facebook. The weird videos, the conspiracy articles, the vaguely-uncomfortable and probably not PC tweets. You know the drill. Unstructured data is data that doesn’t have some model (or metadata) to describe it. Most unstructured data is very text heavy: essays, articles, blogs and reviews receive most of the focus right now.

But what of this JSON example. Imagine this JSON response:

article: {
	'author': 'Ross',
	'tags': ['cool guy', 'trendy topics'],
	'text': 'Here is my article on big data. Oohh'
}

On the one hand, we know the author, tags and texts right then and there, so it’s structured. On the other hand, “cool guy” and “trendy topics” aren’t absolute values like $1.99 or 1/4/2015. So it’s a toss-up. How do we break the tie? Bad question, readers. Let’s instead ask how we can use unstructured data as if it’s structured, using movie reviews as a start, my next blog post. The point that I’d like to make here, is big data is useless (like all data), without appropriate analysis. The analysis for big data is just cooler.

Updated: