DataMarket – Energy: New Business enabled by Open Data
Last week DataMarket introduced a new product, an energy specific data service called simply: DataMarket – Energy
The venue at which we introduced the service was quite unusual. We were lucky enough to be invited – along with a selected group of other startups and innovators working with energy data – to present our work at the White House at an event called Energy Datapalooza. We have since jokingly said that in order to top this venue for our next product announcement, we will have to book the International Space Station. I’m working on that.
Here’s a short video of my presentation there and the unveiling of our new service:
For those of you that have been following DataMarket for a while, you will notice that the business model for this new product is significantly different from what we have previously been running with.
When we originally kicked DataMarket.com off with international data early 2011, there was only one thing users could pay us for: A low-priced premium subscription that gave access to additional features, such as more advanced data export formats, automated reports and a few other things. A couple of months later we added the first premium data to the site; data from premium data providers such as the Economist Intelligence Unit (links to EIU data on DataMarket), resold through our site.
However, using the site’s core functionality – the ability to search, visualize, compare, and download data from the vast collections of Open Data that we aggregate – has always been free. As such, DataMarket.com has become quite popular in certain circles. But quite frankly, the two revenue sources have not taken off in a big way.
What has however taken off is our technology licensing business. We’ve seen high demand for our data delivery technology from other information companies. The ability to normalize data from a wide variety of data sources, and enable users to access that data through powerful search and online visualization tools is something many information companies, such as market research and financial data companies, have identified a strong need for. So last February we formally introduced our data publishing tools, most prominently what we now call the Data Delivery Engine, a white-label solution that is already up and running for a few well know information companies, (including Yankee Group and Lux Research) with several other in the implementation stages. This licensing business is where most of our revenues comes from today, so one could really say that we’re now more of a software company than a data company.
The upcoming launch of DataMarket – Energy is another stab at the data side of the equation, but the approach is different in several ways:
- Focus and scope: By focusing on a single industry or vertical we can make the service much more relevant to its users. Instead of solving 10-15% of everybody’s data needs with the kind of macro-economic and demographic data that can be accessed on DataMarket.com, we aim to address 90-100% of the data needs of a much more targeted audience.
- Premium access: We’re selling access to this service at a substantial premium (final pricing is still being decided). Those that see value in the discovery and aggregation services that we add on top of the data will be charged for the “job they hire our product to do”. This indeed means that some data that has been made publicly available for free (Open Data) will only be available to DataMarket users behind a paywall. As explained in the presentation above, that doesn’t take the least bit away from the value of the Open Data. On the contrary: The data is still available in its original form from the publishing organizations, but we add a choice on top of that: A nicer and more user friendly way to access the data for those that are willing to pay for that value-add.
- Targeted sales: Instead of relying as much on PR and viral distribution as we have with DataMarket.com, we’ll use more direct, traditional sales approaches for this new service.
One of the interesting things about running a technology startup is that the same technology can be turned into so many different products without a single line of additional code. Often the only difference is how you promote it, price it and sell it. This can be both a curse and a blessing, and usually a few things need to be thrown at the wall before you find what sticks. Luck is involved too, but as the famous Norwegian Swedish alpine skier Ingemar Stenmark is quoted saying: “The more I practice, the luckier I get“.
It will be interesting to see if we’ve practiced our data marketing skills enough for the DataMarket – Energy approach to work out.
Best Practices for Publishing Data
Slides from a presentation given by Hjalmar Gislason, founder and CEO of DataMarket at Strata Conference in London, October 2012
Worse than a 3D pie chart
I have seen my share of good charts and I have seen my share of bad charts, but I never expected what I saw today.
As you may know, Hjalli and I are writing a book about chart design. We will guide you through choosing the best chart for your story, and to create beautiful and effective charts. The book will be aimed at those who want, or need, to get a chart out there but aren’t that interested in the why’s. We start by looking at the charts the big boys do by default and go from there, examining the parts and how to improve them.
The first chapter about chart design is about tables, which was fun to write. There was more to say than we expected.
The next chapter focuses on line charts, where I used Numbers, Excel and DataGraph to create default versions of a line chart. As I knew, there were things that could be better designed in the default versions of all applications. None of the defaults is useable in our opinion. Numbers the least.
Today, I dove into the details of my bar chart design. Called up the author of DataGraph to discuss moving axis labels by a pixel. Stared at my screen for half an hour, wondering if I want to keep the x axis on the bar chart or not. Then I opened up Numbers and Excel to create the defaults. DataGraph was already open, since I can almost do perfect charts in it already. So I started with the DataGraph default. It disappointed me a bit. The y axis didn’t automatically label my bars. Other than that, it was not pretty but useable. Numbers, to be fair, does automatically label the bars.
Next up was Numbers. At first glance it looked fine, the color of the bars was okay and the bars were labeled correctly. As I checked off items in the designing-a-bar-chart list in my head, everything seemed fine. Until it didn’t. At all.
In disbelief, I went straight to Excel to see if this alarm went off there as well. It did. And there was much wailing and gnashing of teeth. I felt like George Taylor in the Planet of the apes: “You Maniacs! You blew it up! Ah, damn you! God damn you all to hell!”
They didn’t include the zero on the x axis! This is no small omission. Their default bar chart is a lie! When comparing the bars, you must compare the full length of the bars.
You can tell both applications to include the zero, but that should not be needed. Creating a bar or column chart without the zero on the axis shouldn’t even be possible. This is worse than a 3D pie chart.
There, I’ve said it.
Instant Feedback: It Applies Everywhere
If you work in software development and you haven’t watched Bret Victor’s presentation “Inventing on Principle”, you must do so now:
This video made its rounds among developers and general nerds early this year to much fanfare. The key take-away – at least for me – was that in order for a creative process (like programming) to be effective you must remove the things that stand between an action and its effect as much as possible, making the whole process more like the real world – more “tangible”. Or in Victor’s words (recited from memory):
Just like a painter immediately sees the effects of his brush strokes [...] a coder should immediately see the effects of his code changes
This presentation has inspired several projects based on the “instant feedback” concepts Victor sets forth so powerfully.
Several of these projects have been coming to fruition over the last few weeks, including:
- Gabriel Florit’s Livecoding.io
- Kahn Academy’s new coding environment – (blog post)
- Chris Granger’s LightTable – (introduction)
- Geoff Goodman’s Plunker
These – as well as most of the examples in Victor’s original presentation – are all IDEs (nerd-speak for whatever nerds use to write programming code).
But while listening to Gabriel Florit presenting livecoding.io at the Boston DataVis meetup yesterday it dawned on me that this is not just about coding.
Yes, I am *that* slow (or was so distracted to begin with by thinking about how empowering these concepts will be for coding) but: Instant feedback should be the default behavior for all software.
Granted, the feedback problem is particularly bad in software development, but think about all the other software you use – or develop:
- Wherever there is an “Apply” button there is room for more instant feedback.
- Wherever these is a modal dialog window there is room for more instant feedback.
- Wherever the user doesn’t see the effects of a change or a choice until several choices or commands have been made, there is room for more instant feedback.
- Wherever the user is changing something he’s unable to see and most hold in memory or imagine the results of his actions, there is room for more instant feedback.
…and it will always make the software feel more tangible and “natural” to use.
Of course there are cases where – for performance reasons or otherwise – this may not be feasible, but the industry as a whole is way too stuck in the “Apply changes” mindset.
I can definitely see a number of improvements we can and will do on the – already pretty dynamic – interface of DataMarket.com guided by these principles.
I must echo what my colleague, Vidar Masson, said this morning, talking about Bret Victor: “People are going to remember this presentation for a long time!”.
Indeed! And its effects will reach far beyond the IDE.
The 11 Best Data Quotes
Ever since before starting DataMarket back in 2008, I’ve been collecting funny, insightful and thought-provoking quotes about data and information. Here is my current list of top 11 favorites:
-
11. Many have tried to describe the importance of data in industrial, or even agricultural terms
Data are becoming the new raw material of business
- Craig Mundie, head of research and strategy, Microsoft
Data is the new oil!
- Clive Humby, ANA Senior marketer’s summit, 2006
Information is the oil of the 21st century, and analytics is the combustion engine,”
- Peter Sondergaard, senior vice president at Gartner
Data is the new oil? No: Data is the new soil.
- David McCandless, TEDGlobal, 2010
-
10. …others in terms of previous breakthroughs in IT
Data is the Next Intel Inside
- Tim O’Reilly, What Is Web 2.0
-
9. The love of data visualization is not new
There is a magic in graphs. The profile of a curve reveals in a flash a whole situation — the life history of an epidemic, a panic, or an era of prosperity. The curve informs the mind, awakens the imagination, convinces.
- Henry D. Hubbard, 1939
-
8. First we have data…
It is a capital mistake to theorize before one has data.
- Sherlock Holmes, A Study in Scarlett (Arthur Conan Doyle)
-
7. …the rest is built on top
You can have data without information, but you cannot have information without data.
-
6. He may not have always played by the book, but he knew what was needed to get the job done
The most valuable commodity I know of is information.
- Gordon Gekko, Wall Street (1987)
-
5. A reminder to be careful in your analysis and don’t stretch to get the results you’d like
Torture the data, and it will confess to anything
- Ronald Coase, Economics, Nobel Prize Laureate
-
4. People take good care of data that is important to them
Data that is loved tends to survive
- Kurt Bollacker, Data Scientist, Freebase/Infochimps
-
3. …and – as most good things – it just improves with age
Data matures like wine, applications like fish
-
Andy Todd ?James Governor (see comment below) -
2. What use are statistics any way?
In times like these when unemployment rates are up to 13%, income has fallen by 5% and suicide rates are climbing I get so angry that the government is wasting money on things like collection of statistics!
- From Hans Rosling’s The Joy of Stats
-
1. Finally, my very favorite data quote (and principle in life)
If we have data, let’s look at data. If all we have are opinions, let’s go with mine.
- Jim Barksdale, former Netscape CEO
Additional submissions welcomed in comments below. What are your favorite data quotes?
Tim Berners-Lee’s missing star
Most of you Open Data enthusiasts out there will be familiar with Tim Berners-Lee’s five star system, a no nonsense rating system for the usefulness and utility of a openly released data set:
![]() |
1 star for releasing data at all (even PDF of scanned paper) |
![]() ![]() |
2 stars for releasing it in structured, machine-readable formats (e.g. Excel file) |
![]() ![]() ![]() |
3 stars for releasing it using non-proprietary file formats (e.g. CSV file) |
![]() ![]() ![]() ![]() |
4 stars for releasing it as linked open data |
![]() ![]() ![]() ![]() ![]() |
5 stars for linking the data to other linked data sources |
For those not up to speed, here’s Sir Tim explaining in a short video (first 2 minutes will do it for this purpose):
As stated in Matt’s earlier post in praise of CSV, we firmly believe that the biggest bang for the buck comes from reaching 3 stars fast and then aiming for the fourth and the fifth star as a part of your organizations’ long term data platform strategy.
However, there is a missing star in Tim’s grading system. Releasing your data in CSV or other structured, machine-readable, non-proprietary format is certainly worthy of three stars, but if you are releasing dozens, hundreds or even thousands of data sets, you should also aim to do so in a consistent, well-documented manner across all your data sets.
Why? Because a developer or a data scientist hacking away at your data should not have to determine the structure of each data set individually. They’ll want to be able to write a generic piece of code that slurps up any (or all) of your data sets in the same way. If you have a 100 different data sets, structured in a 100 slightly different ways, it will take them almost a 100 times longer to make use of all your valuable data.
The same goes for the discoverability of the available data. Provide proper, machine-readable directories. And for associations with meta-data, whether in the data file or provided in separate files with a clear association (see Matt’s post for details).
Oh, and you want to avoid the files to be prepared by hand, even final touch-ups. It will lead to mistakes. If you do, make sure you write tests that check for your consistent structure and other possible errors before publishing a data set.
So, that said, here’s our revised version of Tim Berners-Lee’s 5 star system:
![]() |
1 star for releasing data at all (even PDF of scanned paper) |
![]() ![]() |
2 stars for releasing it in structured, machine-readable formats (e.g. Excel file) |
![]() ![]() ![]() |
3 stars for releasing it using non-proprietary file formats (e.g. CSV file) |
![]() ![]() ![]() ![]() |
3.5 stars for using consistent format, discoverability methods and meta-data associations across all your data sets |
![]() ![]() ![]() ![]() |
4 stars for releasing it as linked open data |
![]() ![]() ![]() ![]() ![]() |
5 stars for linking the data to other linked data sources |
In your early open data initiatives, aim for at least 3.5 stars!
Why Open Data is all about Apps, and why it shouldn’t be!
Open data initiatives rock. In fact, without the trend of government and international organizations releasing their data under open licenses, DataMarket.com wouldn’t be so incredibly interesting. So obviously we love them!
Yet, I have something of a grudge against the emphasis on very specialized apps in Open Data initiatives. “Apps for this”, “Apps for that”, competitions, cute little prices, etc., etc.
Now don’t get me wrong, many of these apps are great, but they only release a tiny fraction of the value in all the data that has been opened up. That’s certainly true of each single app, but it’s also true of them in aggregate. Here’s why.
Most, if not all of the data that has been opened up, has been published in a format that is relatively accessible to developers and other data savvy people, but not so much for consumption by mere mortals. Therefore, in order for a successful app to emerge, three things have to come together as depicted in this Venn diagram:
- Data has to be available
- There must be obvious user demand
- And as developers are the “keymakers” to all this data, there must be some developer incentive, be that money, coolness, recognition by peers or all of the above
Looking at the examples of successful open data apps out there, this pattern becomes quite obvious. Take the plethora of city data that has been opened up over the last 3 years or so. An overwhelming majority of the successful apps created on top of this data are transportation apps. All three elements are there. The data, the obvious need by millions of people and the developer incentive to: scratch their own itch, create awesomeness and make money (roughly in order of priority). And these apps are cool, I use some of them almost every day!
However, because of the three requirements mentioned earlier, there is such a great portion of the data that has been opened that is still just lying around unused. Again, Venn:
I’ll use another example of city data to explain: Sewage data. The data is there, but the demand may not be obvious, and there’s nothing sexy about making this data more accessible. I mean: “Who loves sewage information?”
Tell you what. I’m sure that if this data could be made better available to:
- …construction workers to prevent pipe cuts
- …environmentalists and policy makers to improve the regulatory environment
- …advertisers to calculate the percentage of the half-game audience that missed their ad when taking a leak
…the overall social and economic benefits of better access to sewage data alone could be quite dramatic. And that’s just one example.
So, what I’m getting at is this: When thinking about Open Data initiatives, think beyond the apps. Think not only about the high publicity use cases that are a worth a few dollars to millions of users. Think also about the less sexy cases that can help a few people save us millions of dollars in aggregate, generate new insights and improve decision making on various levels.
Think about how you can encourage data portals making the bulk of your data accessible to mere mortals, not only to developers. Think how you can get existing software vendors to integrate your data, and how you can make business users and other decision makers aware that this data indeed exists.
There could be more to Open Data than a bunch of cool consumer apps.





