DataMarket blog

Data, visualization and startup life

Data and Visualization: Predictions for 2011

with 9 comments

A lot of my time these days goes into planning DataMarket‘s efforts in the new year. An essential part of that is trying to grasp the major trends in areas that matter to us.

DataMarket is building an active marketplace for statistics and structured data. We believe in a “visual data exploration” approach, meaning that users’ first experience with any data is a visualization that should provide a quick overview of what the data is all about, then allowing users to dig deeper to see the raw numbers, download the data in various formats, embed it in other web content or connect to the data live using our API.

This vision, and our goals for the coming year – including our launch of an international data offering – frame the topics that I’ve been thinking about. For links to broader predictions in the fields of Big Data and Data-as-a-Service see the bottom of this post.

That said, here are the things I believe will shape our key areas of interest in 2011:

Data Markets

Data markets will become widely accepted as an emerging field.

I’ve previously defined these as “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable – and often unified – format. Several of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers.” I used the analogy of “Amazon-for-data”, but I see that others have started using “Data app stores”, which may in fact be closer to home. With at least 8-10 efforts to build such services, some already with significant VC backing or led by large corporations, the space is heating up.

I believe we’ll see many of these services differentiate themselves in 2011 by focusing on specific types of data. There are definitely opportunities in building specialized data markets for geospatial data, for statistics and for enormous scientific data sets – to name a few types – and each comes with their own challenges, target audiences and preferred approaches. In the spirit of doing one thing and doing it well, I think most of these projects will want to see success in one such segment of the market before generalizing – or consolidating.

Chart solutions

A couple of chart solutions will separate themselves from this important, but crowded space by maturing and gaining a large developer following.

My bet is on open-source, HTML5 compatible, vector based solutions written in JavaScript. We have been using a commercial, Flash-based solution called amCharts up until now. When we started our development in early 2009, this was simply the best option given our early requirements that mainly involved rich interactivity on the UI side.

As our application has matured, our requirements have expanded:

  • We need more control to fix bugs, control the look and feel and implement features that are not supported by default in amCharts (or other similar solutions for that matter) = Open source.
  • We need to support iPad, iPhone and other devices and software that don’t run Flash + we kind of like standards = HTML5
  • We need to be able to render beautiful charts for high-resolution printing and bitmaps alike = Vector based
  • We still need a rich, interactive UI = JavaScript
  • We also need to be able to run the code both client and server-side, which has led to some serious experimentation with Node.js = JavaScript again

Based on these (and other) requirements, we are betting heavily on Protovis, a brilliant solution written by Mike Bostock and Jeff Heer of the Stanford Visualization Group. The drawback is lack of support for Internet Explorer 7 and 8 – something that can by no means be ignored for a business-oriented solution with a broad target audience – but we believe we’ve found suitable workarounds.

Other projects that I believe might rise above the rest are Highcharts and possibly gRaphaël or other Raphaël-based solutions. Here’s a list of alternatives (and another one).

Data journalism

Online media, data journalists and bloggers will increasingly use ready-made or reusable technologies to enrich their stories with data, data visualization and charts.

A lot of the great efforts by leading media in the field such as the NY Times and The Guardian are built specifically around a specific data set to tell a specific story. This is a high-cost, high-return approach, and even the largest media can only afford to do a few such stories a month. A lot of stories – however – will benefit from a simple chart or a map showing the big picture, then allowing the readers to dive in deeper if they want to test their own theories or do some analytics of their own.

An example could be an article on unemployment. The piece could include a graph showing nationwide unemployment for the last decade, then allow readers to dive in to compare the latest figures for different areas, see the development over a longer period or compare unemployment with – say – inflation rates. This should not be a specific project for that piece, but a generic solution that gives the journalists themselves access to the underlying data and ability to configure the setup so that he can easily attach any relevant time series data to any article. Similar examples are easy to think of in geospatial data, large collections of text documents (think the Wikileaks files) and any other kind of data.

Something in the lines of ManyEyes and Gapminder rather than special (nonetheless great) efforts like NY Times’ Budget Porcupine Graph and The Guardian’s US embassy cables.

VC activity

There will be plenty of VC activity in the Big Data / Data-as-a-Service / Data Markets fields in 2011.

Most people seem to agree that VC spending in general will be on the rise in 2011, and early stage funding for companies such as InfoChimps and Timetric and larger funding rounds for companies such as Socrata and Factual indicate that this space will be one of the hot topics for VCs in the coming year.

Some larger name VC funds, such as Andreessen-Horowitz, Morgenthaler Ventures and Benchmark Capital are already involved, and many others are at least exploring options in the space. There is even – at least one – early stage investment fund that is dedicated to Big Data.

- – -

There are obviously many more trends and developments that will shape the year to come, but from our perspective these are some of the most important.

If you’re interested in a broader look on what 2011 may bring in the field of Data-as-a-Service or “Big Data”, here are a few prediction posts I’ve come across:

Written by Hjalmar Gislason

December 30, 2010 at 5:44 pm

Posted in Uncategorized

9 Responses

Subscribe to comments with RSS.

  1. What about data storage solutions? I have a feeling that emerging data market companies will sooner or later break away from using “traditional” RDBMS databases for storage and turn more towards so-called NoSQL key-value datastores, such as Riak, CouchDB and Hadoop.
    It’s not only the question of scalability and performance, it’s also the question of storing data in non-conformant way and extract information using MapReduce or custom techniques (specific to a certain domain of information).

    Alex

    December 30, 2010 at 7:18 pm

    • I believe you are right Alex. These solutions will play a huge role in the future of Big Data – and pretty much anything that is really cloud-based will probably use such solutions rather than RDBMS in the long run.

      Hjalmar Gislason

      January 1, 2011 at 1:43 pm

  2. Thanks for this article, just wanted to show you the Wikileaks cables interface we did for Le Monde in France : http://www.lemonde.fr/documents-wikileaks/visuel/2010/12/06/wikileaks-lire-les-memos-diplomatiques_1449709_1446239.html#ens_id=1446739

    Best.

    Guilhem Fouetillou

    January 4, 2011 at 9:54 am

  3. Some nice predictions here. I have a comment about Protovis and ManyEyes

    Protovis is incredible and can be used to create some stunning visualisations. I am really impressed by it. Without big changes, I think it will remain a niche product. Why? To make a visualisation that works well, you need to be comfortable with scripting/programming techniques. For this reason alone, it won’t hit the mass market.

    ManyEyes, on the other hand, is better designed for the non-programmer, mass market. However, it too is flawed. It is not customisable enough and its visualisations just aren’t good enough, IMHO.

    Google Charts is one player that I think deserves a mention. Their charting engine is excellent. It doesn’t provide interactivity, mind. However, the fact that you can return a PNG file from a URL is really powerful, and if someone can build an intuitive chart-builder UI, this could take off.

    Andy Cotgreave

    January 11, 2011 at 12:35 pm

    • Good points. I would agree with you if we were talking about end-user products, but I’m precisely talking about libraries for developers that are creating visualization-heavy solutions, such as our DataMarket.

      ManyEyes is fantastic, but it’s a totally different beast.

      Google Charts – on the other hand – can work as such a library. There are several reasons that it wouldn’t suit OUR needs specifically, but you’re right that it deserves a mention.

      Hjalmar Gislason

      January 11, 2011 at 12:46 pm

      • Thanks for clarifying, Hjalmar. I came at this thinking about which visualisation tools would be adopted by data geeks who don’t have a developer background. That market is potentially larger than just developers and the data market company that can tap into that area will reap the most benefits.

        None of which is news to you, I’m sure :-) And from what I can see, your focussing on these people too by providing the download formats you do.

        Good luck with the venture – I’ve only discovered your site today – it look v exciting.

        Andy Cotgreave

        January 11, 2011 at 1:02 pm

      • Thanks Andy. Hope you saw that we’re launching internationally shortly. Any feedback is obviously welcomed.

        Hjalmar Gislason

        January 11, 2011 at 1:24 pm

  4. [...] datamarket blog – “data and visualization: predictions for 2011″ [...]

  5. [...] new embeds are the first part of our system to use our new graph component (briefly discussed in my 2011 predictions). This is a solution that we have built on top of the fantastic Protovis library, but with [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 55 other followers

%d bloggers like this: