In The World of Data, Context is Everything
As we all know, the answer to The Ultimate Question of Life, the Universe, and Everything is 42. Unfortunately the question is still unknown, and the lack of that context renders the answer meaningless altogether. The answer is not enough. Without the proper context, “42” is just a meaningless number.
On DataMarket.com you will currently be able to find more than 1 billion fact values, or “facts” for short.
Facts are numbers. They could be other things too. But on DataMarket — for now — all the “facts” are numbers.
Then again, a number is not necessarily a fact. A number is a fact only when it is associated with interpretive information. That is: in context.
Just take a look at this chart. Even though all the values are there, this chart tells you … nothing:
On DataMarket, we express interpretive information in the form of attribute values, like:
- Country: Sweden
- Species/Breed of poultry: Turkeys (chicks for fattening)
- Activity of hatcheries: Chicks hatched
- Month: June 2008
- Title: Poultry
A fact may have any number of attributes. Actually, the more attributes, the better you may be able to understand the meaning.
These attribute values collectively lend meaning to the number, making it a fact. A number is just a number. But associated with all of the above attribute values, it might mean the number of turkey-chicks hatched for fattening in Sweden in June 2008.
Instead of “Does it have meaning?”, ask “How much meaning?” A fact is meaningful and unambiguous to the extent that its attributes make it so.
The fewer the attributes, and the less meaningful they are, the more the fact is really just a number, not a fact.
Take the above fact – a number value associated with “Country: Sweden”; “Month: June 2008”; “Species/Breed of poultry: Turkeys (chicks for fattening)” and “Activity of hatcheries: Chicks hatched”. Try omitting any one of the attributes and figuring out what the fact might then mean:
- Skip “Country: Sweden”: Is this then the total number of hatched turkey chicks in all countries? Or in one or more unspecified countries?
- Skip “Month: April 2011”: Is this the total number ever hatched in Sweden? Or over some other unspecified period?
- Skip “Species/Breed of poultry: Turkeys (chicks for fattening)”. Is it the total amount of poultry hatched in Sweden in the given month? Chicken included? What about those for egg-laying?
- Skip “Activity of hatcheries: Chicks hatched”: Is this the number of chicks currently alive? Or ever born? Or dead? Or what?
Omitted attributes leave the reader guessing
When attributes are omitted, we end up making assumptions instead — in effect making stuff up, often without being aware that we are doing so.
Imagine, for example, a data set titled “Seafood exports” with attributes “Country: United Kingdom” and “Species: Cod”. Seems straightforward — right? The export of cod from the United Kingdom. What could be the problem?
But if I told you the source of this data set was “Statistics Iceland”? All of a sudden this is more likely to mean Iceland’s cod export to the United Kingdom than the UK’s cod export to everywhere else.
The Ultimate Question revealed!
Communicating and understanding the true meaning of data is tricky.
At DataMarket we do our best to provide our users with all the context we have available from the source to convey the meaning of the data that we are presenting. Unfortunately the providers do not always do a good job of this themselves, so they leave their audiences guessing anyway. Proper meta-data and full context is surprisingly often omitted, even by prestigious data providers.
But at least finally we have it: 42 is the answer to the question “How many turkey-chicks did the Swedes hatch for fattening in April 2011?”
Associated with some other set of attributes entirely, the number 42 would be some other (probably unrelated) fact.
And the number 42 alone — well, that’s just a number. Not a fact.