DataMarket’s new US office
DataMarket is setting up an office in the US. More specifically in Cambridge, Massachusetts.
For those of you not familiar with the background, DataMarket is originally founded in Iceland in the summer of 2008. That’s where our product team is – and will be – located. We initially launched our services here and for the local market mainly, but with the obvious intention to broaden our scope. The opportunity for an active market place for data is obviously a global one and certainly not limited to our tiny island of 320 thousand inhabitants!
In fact, today – January 24th – marks the first anniversary of our international data offering.
A lot has happened since. We’ve learned a bit about what works, and a lot about what doesn’t in the emerging field of data markets. We’ve managed to build a significant and largely recurring revenue base, even though some of the revenues are coming from services we didn’t necessarily foresee a year ago. We’ve established good connections with some of the most interesting data providers out there. And we’ve learned a lot from feedback from our users and customers. Some of that feedback has already been incorporated in our product and technology.
At the Strata conference in late February, we will announce a range of new features, subscription plans and data sources, all resulting from the lessons we’ve learned in the last 12 months. More on that later!
The US office is also a result of this learning curve. Despite all the wonders of modern communication technologies, location still matters. Nothing beats meeting people face to face, looking them in the eye, listening to them describe their challenges and watching their reaction to your demo, your pitch and your sales arguments. Hardly anything sells itself over the Internet. Even Google has an army of people doing traditional sales: wining and dining, manning call centers, networking, meeting, greeting and doing business like business has been done for ages. And they’re Google!
Also, it turns out that there are more enterprise level opportunities in our business than we originally thought. And while data and feature subscription plans can indeed be marketed and sold online, enterprise solutions most certainly can not.
So, we’re setting up an office in the United States to build out our sales, marketing and business development operations.
And why Cambridge? First of all, the East coast was almost a no-brainer for us. The industries that have expressed the most interest in what we are doing, the research, media and financial industries are stronger on the East coast than the West. Looking at our sales pipeline it’s dominated by companies in Boston, New York and Washington D.C. This is also true of investors interested in the type of business we’re building. To overgeneralize, the data start-ups we’ve seen funded on the West coast tend to be more in the social, consumer oriented end of the spectrum, while those on the East coast seem to be more of the B2B, business analytics, financial nature.
We had our eyes set pretty firmly on New York, but then in a few weeks timespan late last year we saw good success with a few really interesting leads in the Boston area. In fact we’ve already signed a couple of super-interesting customers there and there are more in the pipelines. The research industry is really strong in the Boston area and that industry seems to be quite interconnected giving us a lot of opportunities to work the network and get more business going for us. Last but not least we value being close to the great universities in the area. So Cambridge it is.
And the commute from Boston to New York quite convenient – especially compared to the commute from Iceland.
I (Hjalmar) will be moving over in a few weeks time to start building the team and our success in this very dynamic market. I’d be most interested in hearing from people that would like to join our team or look into opportunities in working with us. If you are interested, please do not hesitate to get in touch.
Exciting times!
DataMarket’s Declaration of Principles
At DataMarket we take data seriously. We are committed to doing our best to communicate the data we work with in the most objective, understandable and clear way possible.
To guide our decision making, state to our partners and customers, and remind ourselves of this commitment we’ve written our very own “Declaration of Principles” (in the spirit of Citizen Kane) that now proudly hangs on our office wall.
The declaration reads as follows:
Declaration of Principles
We respect data
- We will remain impartial and unbiased.
- We will deliver to users the same data we received from the data provider.
- We will never taint the presentation of data with value judgments.
Our performance is impressive
- We will never let users wait more than two seconds.
- We will never cut corners so that performance suffers.
- We will always be up.
Our user experience is delightful
- We help users find and understand the data that is relevant to them.
- We will never tell a user to read the manual.
- We will never let user experience suffer to make our lives easier.
- We will always continue to improve the user experience.
Our charts and tables communicate the data correctly AND look fantastic
- Our automated charts are of tailor-made quality.
- Our data views hold all the information needed to understand the data.
- We always convey data providers, sources, units, URLs for further info, licenses, titles, legends, selections and other metadata.
We respect our customers
- We will always strive to make our customers happy.
- We will always be straight with our customers.
We help you fall in love with data!
You can download the poster in PDF format here.
Comments and suggestions are welcomed.
I’m forever blowing bubbles
“Bubble” is a pleasant word, isn’t it? For me, it brings back memories of childhood — real or invented — playing in the garden, blowing big globs of washing-up liquid into the air and watching them pop above my head.
But put “economic” in front of it and we have something far less pleasant. Economic bubbles are most certainly not filled with fun, although there’s obviously enough pleasure (and by pleasure I mean money) to be found in the period before the bubble pops that they keep on coming. Tulip mania, the South Sea bubble, the Roaring Twenties stock market bubble, the dot-com bubble (originally and cunningly branded the dot-com boom), all of which have culminated in property bubbles we see around the world today.
With the news this week that house prices in Ireland have dropped by 60% across the country we might ask: just how bad is the property bubble? Well, we at DataMarket have around 2,800 public datasets from the Federal Reserve of St Louis (one of the twelve regional banks that make up the central banking authority of the United States) and one recently-updated dataset that gives us some insight shows the number of new homes sold every month in the United States between 1963 and 2011. In 1963 a total of 560,000 new homes were sold across the country and by the peak in 2005 that had more than doubled to 1,283,000.

New homes sold in the United States, 1963–2010. Annual data aggregated from the monthly data provided by the Federal Reserve of St Louis.
Then, the cliff-face. In 2008 the number of new homes sold had become just 485,000, a mere 38% of the level three years earlier. The total hadn’t been this low since 1982.
Sales kept on dropping. By the end of 2010 annual sales stood at 322,000 — the lowest yearly total on record — and for the first eleven months of 2011 the figure stands at only 281,000. Unless December is a bumper month (and house sales are always lowest in winter) 2011 looks like stealing the award for “Worst Year on Record” from its slightly older brother, 2010. The lowest month on record is November 2010 (20,000) and we may see December 2011 steal that award too.
It’s not all bad news though: unemployment is down. Perhaps things are looking up and we may not be too far away from that next bubble after all.
In The World of Data, Context is Everything
As we all know, the answer to The Ultimate Question of Life, the Universe, and Everything is 42. Unfortunately the question is still unknown, and the lack of that context renders the answer meaningless altogether. The answer is not enough. Without the proper context, “42” is just a meaningless number.
On DataMarket.com you will currently be able to find more than 1 billion fact values, or “facts” for short.
Facts are numbers. They could be other things too. But on DataMarket — for now — all the “facts” are numbers.
Then again, a number is not necessarily a fact. A number is a fact only when it is associated with interpretive information. That is: in context.
Just take a look at this chart. Even though all the values are there, this chart tells you … nothing:

Interpretive information
On DataMarket, we express interpretive information in the form of attribute values, like:
- Country: Sweden
- Species/Breed of poultry: Turkeys (chicks for fattening)
- Activity of hatcheries: Chicks hatched
- Month: June 2008
- Title: Poultry
- …
A fact may have any number of attributes. Actually, the more attributes, the better you may be able to understand the meaning.
These attribute values collectively lend meaning to the number, making it a fact. A number is just a number. But associated with all of the above attribute values, it might mean the number of turkey-chicks hatched for fattening in Sweden in June 2008.
Instead of “Does it have meaning?”, ask “How much meaning?” A fact is meaningful and unambiguous to the extent that its attributes make it so.
The fewer the attributes, and the less meaningful they are, the more the fact is really just a number, not a fact.
Take the above fact – a number value associated with “Country: Sweden”; “Month: June 2008”; “Species/Breed of poultry: Turkeys (chicks for fattening)” and “Activity of hatcheries: Chicks hatched”. Try omitting any one of the attributes and figuring out what the fact might then mean:
- Skip “Country: Sweden”: Is this then the total number of hatched turkey chicks in all countries? Or in one or more unspecified countries?
- Skip “Month: April 2011”: Is this the total number ever hatched in Sweden? Or over some other unspecified period?
- Skip “Species/Breed of poultry: Turkeys (chicks for fattening)”. Is it the total amount of poultry hatched in Sweden in the given month? Chicken included? What about those for egg-laying?
- Skip “Activity of hatcheries: Chicks hatched”: Is this the number of chicks currently alive? Or ever born? Or dead? Or what?
Omitted attributes leave the reader guessing
When attributes are omitted, we end up making assumptions instead — in effect making stuff up, often without being aware that we are doing so.
Imagine, for example, a data set titled “Seafood exports” with attributes “Country: United Kingdom” and “Species: Cod”. Seems straightforward — right? The export of cod from the United Kingdom. What could be the problem?
But if I told you the source of this data set was “Statistics Iceland”? All of a sudden this is more likely to mean Iceland’s cod export to the United Kingdom than the UK’s cod export to everywhere else.
The Ultimate Question revealed!
Communicating and understanding the true meaning of data is tricky.
At DataMarket we do our best to provide our users with all the context we have available from the source to convey the meaning of the data that we are presenting. Unfortunately the providers do not always do a good job of this themselves, so they leave their audiences guessing anyway. Proper meta-data and full context is surprisingly often omitted, even by prestigious data providers.
But at least finally we have it: 42 is the answer to the question “How many turkey-chicks did the Swedes hatch for fattening in April 2011?”
Associated with some other set of attributes entirely, the number 42 would be some other (probably unrelated) fact.
And the number 42 alone — well, that’s just a number. Not a fact.
The Anatomy of a Fox News Chart
A few days ago MediaMatters wrote about a misleading chart aired on Fox News:

…pointing out fact that the data point for November at 8.6% (a two year low) was obviously misplaced:

For comparison, MediaMatters showed the same period on a chart taken from the website of the US Bureau of Labor Statistics (select 2011-2011 as a period to replicate), claiming an “alarming pattern of dishonesty”:

The difference in the charts is certainly striking, but this comparison is not completely fair either.
The two problems with comparing the BLS chart to the Fox chart are:
- They show different ranges on the Y-axis: BLS shows the range from approximately 8.55%-9.25% while the Fox chart at first glance seems to show approximately 7.75%-10.25%
- They have different aspect ratios: The aspect ratio of the chart area in BLS’ chart is ~2.1 compared to ~3.4 in Fox’s
…both make Fox’s mistake/lie look worse in comparison.
As an enthusiast for accurate visual representation of data I wanted to do better and overlay Fox’s original chart with a line using the same aspect ratio and axis, leading to some interesting findings…
Dissecting Fox’s chart
First, let’s establish the correct Y-axis. The horizontal guidelines on the chart would normally represent whole numbers and “nice” fractions thereof. As most of the values hover around 9.0%, that line should be easy to establish. However, none of the 9.0% values hit any of the lines accurately. As a matter of fact, all the 9.0% points fall right beween two horizontal guidlines:

The highest and lowest values on the chart (other than the last, wrongly drawn one) are 8.8% and 9.2%. Both fall close to guidelines that are 1.5 guideline-gaps away from the 9.0% value:

That would leave the interval between guidelines at 0.133%. Highly unusual, but – ok. Now we extrapolate this finding for all the guidelines:

Interestingly enough the gap intervals differ so this doesn’t match completely, but adding guidelines at regular intervals reveals that the chart actually shows values from approximately 8.3%-9.6%, even if the axis labels say 8%-10% and those labels aren’t even at the ends of the axis!
Drawing the actual unemployment values as a line on our now established grid gives us this:

…and removing the manually drawn guidelines leaves us with the chart as it would have been had it been drawn correctly to begin with:

This reveals even further issues:
- The line from January’s 9.0% to February’s 8.9% to March’s 8.8% (obviously a straight line) is not straight on Fox’s chart. This is just as obvious looking at the initial chart now that it has been mentioned.
- The spacing between the months on the X-axis differs from month to month (just look at June and March for example)
No software would ever draw a chart with these defects, leading me to the conclusion that Fox’s chart is actually a hand-drawn line on top of a background that looks like a chart, with hand-input labels on each data point as well as on the axis.
Whether Fox does this to be able to manipulate their statistical representation by hand or this chart is just a sloppy work by a graphic designer lacking the proper software to do his job I’ll leave for your speculation, but my conclusion is firm:
The chart is drawn by hand! That leaves me wondering: Does Fox News hand-draw all their charts?
Data Pivoting, SVG Image Export, Improved Line Charts and more
We have recently developed and released some new features here on DataMarket and I would like to tell you about them.
- Improved data selector:
The data selector in the data view is now fully consistent between flat and hierarchical datasets. We have built in search for values (really helpful with large dimensions), clear/select all, shift-select any number of listed values, plus the general look and feel has been much improved.This gives users a better experience when they traverse datasets looking for that hidden nugget of information! We should also note that users can now change the width of the data selector panel on the left hand side for a better usability when working with data sets that have long value names. Just grab the panel’s edge and drag it to expand. 
- Pivoting data:
Another new feature in the data view is the ability to pivot data. It is a simple implementation that allows you to toggle how dimensions are mapped to a graph. This is best explained with an example. The two charts show the same data selection. The only difference is in the pivoting:

To pivot the data, simply click the
button above the chart in the data view. Note that there is nothing to pivot unless there are at least two dimensions (e.g. “Products” and “Geopolitical Entity” as in the above examples) with multiple selected values. - SVG export:
We had some customers that needed more control over their image export so we’ve added SVG export from the data view for users with a PRO subscription. So now if you need to export your graph, you can choose between PDF, PNG and SVG. 
- Knots on values in line charts:
Our line graphs have sometimes suffered when following a straight line in the sense that it can become difficult to see what is actual datapoints and what is intrapolated between datapoints. This has now been mitigated by showing knots on datapoints for enhanced readability when the number of series and datapoints is low. 
- Bug fixes:
Finally we have fixed a number of bugs, most prominently:- The time selector now correctly reflects the time spanned by multiple datasets.
- Stacked bar chart now grows correctly with number of series.
- Add/remove datasets now works much more smoothly.
As always, we appreciate your feedback, comments and ideas!
DataMarket welcomes a new team member
We are pleased to announce a new addition to the DataMarket team. Thorsteinn Yngvi Gudmundsson, joins the company as VP Operations.
Thorsteinn Yngvi joins us from Industria where he served as an Executive Director responsible, among other things, for establishing several new telecoms ventures for investors.
An MBA from Reykjavik University, Thorsteinn brings 15 years experience managing and facilitating company growth and developing products for the ITC sector both in Iceland and abroad.
As VP Operations Thorsteinn will be responsible for making sure our production team has the opportunity to keep delivering new goodies for you and improving on the old ones. He will also ensure that implementation of our products on-site goes smoothly for our growing customer base and that support processes go smoothly. So that it will be even easier to do business with us and use our products.
Useless facts about Thorsteinn: studied landscaping, weighs 65 kilos and his favorite hobbies are photography, reading good books and most recently running.
Using DataMarket from within R
This one is for the R users among you — we know there are plenty!
You can pull any data from DataMarket directly into your R session using the rdatamarket package. And it’s so simple! Here’s how.
Quick start
To install the package, execute this in R:
install.packages('rdatamarket')
Then, from any data view on DataMarket.com — say you’re looking at oil production figures for Angola, Brunei and Egypt — just copy the URL from your browser and paste into a call to dmseries or dmlist:
l <- dmlist("http://datamarket.com/data/set/17tm/#ds=17tm|kqc=17.v.i")
Short URLs (data.is, bit.ly, is.gd, t.co) work too:
l <- dmlist("http://data.is/nyFeP9")
That dmlist function gives you a data.frame object. To get a zoo timeseries object, use dmseries:
plot(dmseries("http://data.is/nyFeP9"))
If you need to go through an HTTP proxy, set it up this way:
dmCurlOptions(proxy="http://outproxy.mycompany.com")
Reading metadata
Get a dataset object (find the ID in a datamarket URL, or just paste in the whole URL if you like):
oil <- dminfo("17tm")
oil <- dminfo("http://datamarket.com/data/set/17tm/#ds=17tm|kqc=17.v.i")
print(oil)
This yields:
Title: "Oil: Production tonnes"
Provider: "BP"
Dimensions:
"Country" (60 values):
"Algeria"
"Angola"
"Argentina"
"Australia"
"Azerbaijan"
[...]
See all the values of the Country dimension:
oil$dimensions[[1]]$values
This yields:
a "Algeria" 17 "Angola" d "Argentina" z "Australia" 1l "Azerbaijan" 1b "Brazil" v "Brunei" 1h "Cameroon" 13 "Canada" 1o "Chad" [...]
Here’s a dataset with two dimensions (besides time):
p <- dminfo("http://datamarket.com/data/set/12r9/male-population-thousands")
print(p)
Title: "Male population (thousands)"
Provider: "United Nations" (citing "United Nations Population Division")
Dimensions:
"Country or Area" (229 values):
"Afghanistan"
"Africa"
"Albania"
"Algeria"
"Angola"
[...]
"Variant" (5 values):
"Constant-fertility scenario"
"Estimate variant"
"High variant"
"Low variant"
"Medium variant"
Reading data
From that last dataset, fetch the UN’s population prediction for Sweden and Somalia in the constant-fertility scenario (note the “(thousands)” in the dataset title):
dmseries(p, 'Country or Area'=c("Somalia", "Sweden"),
Variant="Constant-fertility scenario")
Somalia Sweden
2010-07-01 4642.070 4613.551
2015-07-01 5357.233 4725.918
2020-07-01 6211.305 4840.434
2025-07-01 7243.572 4942.865
2030-07-01 8490.929 5021.646
2035-07-01 9990.910 5083.680
2040-07-01 11793.524 5144.685
2045-07-01 13966.319 5211.212
2050-07-01 16597.110 5281.437
The same as a data.frame:
dmlist(p, 'Country or Area'=c("Somalia", "Sweden"),
Variant="Constant-fertility scenario")
Country.or.Area Variant Year Value
1 Somalia Constant-fertility scenario 2010 4642.070
2 Somalia Constant-fertility scenario 2015 5357.233
3 Somalia Constant-fertility scenario 2020 6211.305
4 Somalia Constant-fertility scenario 2025 7243.572
5 Somalia Constant-fertility scenario 2030 8490.929
6 Somalia Constant-fertility scenario 2035 9990.910
7 Somalia Constant-fertility scenario 2040 11793.524
8 Somalia Constant-fertility scenario 2045 13966.319
9 Somalia Constant-fertility scenario 2050 16597.110
10 Sweden Constant-fertility scenario 2010 4613.551
11 Sweden Constant-fertility scenario 2015 4725.918
12 Sweden Constant-fertility scenario 2020 4840.434
13 Sweden Constant-fertility scenario 2025 4942.865
14 Sweden Constant-fertility scenario 2030 5021.646
15 Sweden Constant-fertility scenario 2035 5083.680
16 Sweden Constant-fertility scenario 2040 5144.685
17 Sweden Constant-fertility scenario 2045 5211.212
18 Sweden Constant-fertility scenario 2050 5281.437
The above demonstrates dimension filtering; dimensions and their values can be specified by their $id or their $title, to fetch the data filtered to specific values of a dimension. If no filtering is specified, all of the dataset is fetched (careful: some datasets are enormous, and the DataMarket.com API may truncate extremely large responses).
DataMarket release schedule
DataMarket develops its services on a bi-weekly release schedule and we thought it might interest the readers of this blog what goes out in each release. The major features are covered in their own blog posts but there is a constant stream of smaller features and bugfixes that go out every other week.
Here is a short overview of the customer facing changes that were released in this milestone (codenamed Concorde)
- When exporting to XLS, PNG or PDF you now get the name of the dataset as the filename instead of a generic filename.
- Fixed a bug on negative values in bar charts
- Time axis on column graphs is now positioned correctly.
- Single fact line chart values are now centered correctly.
- Event provider label on gagnatorg.capacent.is now follows the selected language.
- Miscellaneous Internet Explorer fixes regarding layout and stability.
- St. Louis Fed data importer made more stable.


