Presentation at Web 2.0 Expo
Updated March 31 2011: I didn’t realize how immediately relevant this presentation would be. The MAIN point in the presentation was that I feared a setback in Open Data if we didn’t move fast to prove the value of these efforts to the public sector. Only two days later comes the news that the US government is closing down Data.gov!
I added a rough transcript of my talk below, but I also touched up on this subject in this interview at Web 2.0 Expo.
Below are the slides from our session on “The Business of Open Data” at Web 2.0 Expo in March 2011:
As you are probably all aware of, Open Data has been a rising trend over the past few years. Many governments, international organizations and other public sector institutions all over the world have made available vast amounts of data that previously was hard or impossible to get at, and these three gentlemen have been among the most influential in that process.
Obama announced already on his first day in office that all government agencies should make their data available as long as it didn’t threaten security, privacy or other higher priorities. He hired – for the first time in history – a dedicated CIO for the USA, Vivek Kundra and the government has since opened up a data portal, data.gov where thousands of data sets have been made readily available.
Tim Berners-Lee – the father of the world-wide web, no less – has been a key advocate for open data and among other things been consulting the last two UK governments in setting policies and building the data.gov.uk portal. He also had a room full of TED attendees chanting “Raw data now” to great fanfare.
Last but not least – Hans Rosling, a Swedish professor and admittedly one of my greatest heros – has shown many of us how data and open access can not only change our presumptions about the world and enlighten us, but also how it can be done in an extremely entertaining manner. With his wit and humour he has convinced organizations such as the World Bank to completely change their approach to data and open up their data collections.
And this trend is spreading. This map shows the countries where open government data portals have been established.
Admittedly the map is a bit misleading. Large bits of the world are colored, but these are only 14 out of the 193 countries of the world. But the trend is there and a new government data portal is opened almost every month now. I wouldn’t be surprised if this was already out of date.
And – as previously mentioned – international organizations such as the World Bank and the UN have also followed suit.
So, in short – this is an emerging trend. It’s off to a good start but there is still a long way to go.
Now think for a minute what this means. It means that data, gathered in thousands of man years, that previously was hard or impossible to get at, has now been made available to us to do with whatever we like.
To hammer that point home I sometimes talk about the people that lived here. This is a lighthouse called Hornbjargsviti in Northern Iceland, close to the Arctic circle. For over 60 years people lived in this isolated place, and they had two responsibilities:
- To keep the light in the lighthouse lit, and;
- gather data about the weather.
Every 6 hours the inhabitants in this lonely place went out and noted down the temperature, wind, humidity, percipitation, visibility and so on and so forth. Every day, every week of every year for more than 60 years like clockwork.
And this is just one weather station. There are over 200 of them in Iceland, each and every one with a similar story. And that’s still just weather data, in Iceland alone.
So when I say that thousands of man-years have been spent on gathering the data that is now being made available to us, I’m certainly not exaggerating. It’s probably hundreds of thousands of man years.
Imagine the value in all that work – in all that data! The results of hundreds of thousands of man years have been made available to us, just like that – and we can do with it what we like.
So what have we done with it so far?
There has been some cool innovation. Here’s a screenshot form Everyblock, a great site that uses data from a variety of open sources to create hyper-local media – something that matters to every block.
Some corruption has been exposed. Here’s an article detailing how open data exposed criminal activity in the charitable sector in Canada and saved the Canadian taxpayers $3.2 billion dollars.
We’ve seen some fantastic examples of data journalism with organizations such as The Guardian and NY Times – pictured here – leading the way.
And we’ve seen loads of great apps that use open data to make people’s lives easier. Some really innovative, really useful apps, many coming out of open data contests like these. These are all names of actual contest that have been held by governments. You’ll be quick to notice a trend. A focus on apps. A focus on simple, small things – still very useful – but each only tapping into a tiny bit of the vast amount of data that has been made available.
But wait a minute… we weren’t only promised small things. The most prominent advocates promised that Open Data would
- Cause a wave of innovation
- Change the face of journalism forever
- and rid us of corruption in politics and business for good
We were promised that Open Data would change the world… and it hasn’t.
Let’s get back to the data a bit. The data that has been made available – the hundereds of thousands of man years that I mentioned earlier – has a lot of value. It has value to pretty much every business, household and organization out there to make better decisions and better plan for the future.
But most of them don’t know that … yet. Most of them don’t even realize the vast amount of data that is already there, open and free of charge, and how relevant it can be to them.
In order to realize the full potential of Open Data we must make ALL of the data easily accessible and help people discover it while keeping it open and free of charge.
But how do we do that? The world’s governments are hardly going to throw a lot of money at this at the moment, as they struggle to stay afloat in debt and keeping the core services running. It can be hard enough to convince them that the tiny effort needed to make the data available in its raw form is worth-while.
No, this will only happen if we find some business plans around Open Data that allow the private sector to tap into the value created – WITHOUT locking the data behind a pay-wall.
And this is what we’re trying to do at DataMarket.
This is the front page of our data portal: DataMarket.com. What we have here is a search engine for numerical information. I sometimes describe what we are building as “Google for statistics”, a place where you can – within 3-4 years – come and trust that if the numbers exist out there: Statistics, measurements, market research, financial data, whatever – you can find it there. And not only open, public data – but also premium data: data that is only available for a fee.
We’re not there yet, but DataMarket is still pretty darn impressive already. We’ve integrated to data from over 50 organizations, including the UN, World Bank and Eurostat and made all of that available to search, visualize, analyze and download, right there on DataMarket.com
I’ll run you through a quick example – this is all live and available so you can do the same right there on your machines.
Let’s say we want to know something about gold prices. We simply type in the keywords and hit Enter.
The results that come back are data sets from our data providers. This particular query returns 13 data sets – a mixture of public and premium data. As a matter of fact our first premium data sets were just launched today – so make you can make that your soundbite if you plan to tweet, blog or write about this presentation :)
Click through to any of the results – the premium ones will ask you for payment information first – this is what you immediately get: A nice looking, interactive graph where you can further manipulate, compare, visualize or download the data, post it on Facebook or Twitter or even receive an embed code that allows you to post an interactive version of the graph you just made with any blog post or news article. Admittedly – this Guardian screenshot is a fake, but this is what it would look like: Interactive and connected to the live – underlying data.
And despite that we just started, there’s plenty of data in there already:
- Over 100 million time series
- From more than 14 thousand data sets
- Spanning almost 800 years
…and the philosophy is that if the data is open and free to begin with, it’s still open and free on DataMarket.com, just easier to find and use.
So how do we make money then? What is the “Business of Open Data”? Looking at the players in the market we’ve identified several trends:
- Making and selling specialized apps like the ones we discussed earlier – each may have quite different business models based on the merits of the app itself
- Advertisements. This can be done if you can build enough traffic. This is in the end the business model for efforts such as Google Public Data and Wolfram|Alpha – both of whom I define as “data markets” in a sense.
- Reading in the data and making it available through APIs for other developers to create apps as per the point above. This is the stated business model of companies such as Factual, Infochimps, the inconveniently named Windows Azure Marketplace DataMarket launched last October.
- Selling pro subscriptions that give users access to additional features on top of the data, without limiting access to the data itself.
- and finally: Mixing open data with premium data that is only available for those users that pay for access to it. Effectively creating a data market where access to data becomes a traded product made available – not only to developers, but to the end users of data, answering their questions directly and improving their plans.
At DataMarket, our focus is on the last two, as we are an end user facing service, aiming to make all numerical data availble to end users. We actually believe that this is a quite unique mix. Selling premium and proprietary data is already big business. Just think of Reuters and Bloomberg in terms of financial data; and Gartner, Forrester and IDC in market research. These services turn over billions of dollars annually, and have in some cases mangaged to make data that is – at least now – available open and free – look to be proprietary – faux-proprietary – as a friend of mine called it. We’re not going after these guys with real-time financial data, but for the rest we’re coming in with a radically different model aimed at a far wider audience.
That said, we DO have an API and it IS the best place to connect programmatically to data from most of the providers we’ve already integrated to, but it is not core to our focus. The whole model – pricing and features alike – is set up so that we can make all the valuable data that has been opened up in recent government efforts more easily accessible to the public, while business users and other heavy users of the data can subscribe to features and/or data if they have the reasons and resources to do so.
To be honest. I think we – the Big Data Nerds of the world – must step up our efforts to prove the value of Open Data efforts to the governments and agencies involved.
We need think BIG and look beyond apps and smaller geek projects to larger solutions that work with the bulk of the available data so that whenever the public sector opens up a new data source in raw format, they know that it will be picked up and potential users helped to realize its full value.