DataMarket blog

Data, visualization and startup life

In The World of Data, Context is Everything

with 3 comments

As we all know, the answer to The Ultimate Question of Life, the Universe, and Everything is 42. Unfortunately the question is still unknown, and the lack of that context renders the answer meaningless altogether. The answer is not enough. Without the proper context, “42” is just a meaningless number.

On DataMarket.com you will currently be able to find more than 1 billion fact values, or “facts” for short.

Facts are numbers. They could be other things too. But on DataMarket — for now — all the “facts” are numbers.

Then again, a number is not necessarily a fact. A number is a fact only when it is associated with interpretive information. That is: in context.

Just take a look at this chart. Even though all the values are there, this chart tells you … nothing:

Interpretive information

On DataMarket, we express interpretive information in the form of attribute values, like:

  • Country: Sweden
  • Species/Breed of poultry: Turkeys (chicks for fattening)
  • Activity of hatcheries: Chicks hatched
  • Month: June 2008
  • Title: Poultry

A fact may have any number of attributes. Actually, the more attributes, the better you may be able to understand the meaning.

These attribute values collectively lend meaning to the number, making it a fact. A number is just a number. But associated with all of the above attribute values, it might mean the number of turkey-chicks hatched for fattening in Sweden in June 2008.

Instead of “Does it have meaning?”, ask “How much meaning?” A fact is meaningful and unambiguous to the extent that its attributes make it so.

The fewer the attributes, and the less meaningful they are, the more the fact is really just a number, not a fact.

Take the above fact – a number value associated with “Country: Sweden”; “Month: June 2008”; “Species/Breed of poultry: Turkeys (chicks for fattening)” and “Activity of hatcheries: Chicks hatched”. Try omitting any one of the attributes and figuring out what the fact might then mean:

  • Skip “Country: Sweden”: Is this then the total number of hatched turkey chicks in all countries? Or in one or more unspecified countries?
  • Skip “Month: April 2011”: Is this the total number ever hatched in Sweden? Or over some other unspecified period?
  • Skip “Species/Breed of poultry: Turkeys (chicks for fattening)”. Is it the total amount of poultry hatched in Sweden in the given month? Chicken included? What about those for egg-laying?
  • Skip “Activity of hatcheries: Chicks hatched”: Is this the number of chicks currently alive? Or ever born? Or dead? Or what?

Omitted attributes leave the reader guessing

When attributes are omitted, we end up making assumptions instead — in effect making stuff up, often without being aware that we are doing so.

Imagine, for example, a data set titled “Seafood exports” with attributes “Country: United Kingdom” and “Species: Cod”. Seems straightforward — right? The export of cod from the United Kingdom. What could be the problem?

But if I told you the source of that data set was “Statistics Iceland”? All of a sudden this is more likely to mean Iceland’s cod export to the United Kingdom than the UK’s cod export to everywhere else.

The Ultimate Question revealed!

Communicating and understanding the true meaning of data is tricky.

At DataMarket we do our best to provide our users with all the context we have available from the source to convey the meaning of the data that we are presenting. Unfortunately the providers do not always do a good job of this themselves, so they leave their audiences guessing anyway. Proper meta-data and full context is surprisingly often omitted, even by prestigious data providers.

But at least finally we have it: 42 is the answer to the question “How many turkey-chicks did the Swedes hatch for fattening in April 2011?

Associated with some other set of attributes entirely, the number 42 would be some other (probably unrelated) fact.

And the number 42 alone — well, that’s just a number. Not a fact.

Written by Gunnlaugur Þór Briem

January 4, 2012 at 12:18 am

Posted in Uncategorized

3 Responses

Subscribe to comments with RSS.

  1. By far one of the better explanations of the definition of an attribute. I’ll be sharing this with a few data analysts with whom I know will benefit.

    Ed

    January 4, 2012 at 2:46 am

  2. Thanks for this post: an excellent demonstration of the importance of context for data.

    But I can’t help pointing out – with a big smile, mind you – that your choice of 42 as a context-less number is a bit ironic, because it has a cultural context of its own. 42, of course, is “Ultimate Answer to the Ultimate Question of Life, The Universe, and Everything” in The Hitchhiker’s Guide to the the Galaxy. :)

    Aaron Bradley (@aaranged)

    January 4, 2012 at 5:45 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 55 other followers

%d bloggers like this: