DataMarket blog

Data, visualization and startup life

Archive for the ‘Uncategorized’ Category

DataMarket – Energy: New Business enabled by Open Data

with 2 comments

Last week DataMarket introduced a new product, an energy specific data service called simply: DataMarket – Energy

The venue at which we introduced the service was quite unusual. We were lucky enough to be invited – along with a selected group of other startups and innovators working with energy data – to present our work at the White House at an event called Energy Datapalooza. We have since jokingly said that in order to top this venue for our next product announcement, we will have to book the International Space Station. I’m working on that.

Here’s a short video of my presentation there and the unveiling of our new service:

For those of you that have been following DataMarket for a while, you will notice that the business model for this new product is significantly different from what we have previously been running with.

When we originally kicked DataMarket.com off with international data early 2011, there was only one thing users could pay us for: A low-priced premium subscription that gave access to additional features, such as more advanced data export formats, automated reports and a few other things. A couple of months later we added the first premium data to the site; data from premium data providers such as the Economist Intelligence Unit (links to EIU data on DataMarket), resold through our site.

However, using the site’s core functionality – the ability to search, visualize, compare, and download data from the vast collections of Open Data that we aggregate – has always been free. As such, DataMarket.com has become quite popular in certain circles. But quite frankly, the two revenue sources have not taken off in a big way.

What has however taken off is our technology licensing business. We’ve seen high demand for our data delivery technology from other information companies. The ability to normalize data from a wide variety of data sources, and enable users to access that data through powerful search and online visualization tools is something many information companies, such as market research and financial data companies, have identified a strong need for. So last February we formally introduced our data publishing tools, most prominently what we now call the Data Delivery Engine, a white-label solution that is already up and running for a few well know information companies, (including Yankee Group and Lux Research) with several other in the implementation stages. This licensing business is where most of our revenues comes from today, so one could really say that we’re now more of a software company than a data company.

The upcoming launch of DataMarket – Energy is another stab at the data side of the equation, but the approach is different in several ways:

  • Focus and scope: By focusing on a single industry or vertical we can make the service much more relevant to its users. Instead of solving 10-15% of everybody’s data needs with the kind of macro-economic and demographic data that can be accessed on DataMarket.com, we aim to address 90-100% of the data needs of a much more targeted audience.
  • Premium access: We’re selling access to this service at a substantial premium (final pricing is still being decided). Those that see value in the discovery and aggregation services that we add on top of the data will be charged for the “job they hire our product to do”. This indeed means that some data that has been made publicly available for free (Open Data) will only be available to DataMarket users behind a paywall. As explained in the presentation above, that doesn’t take the least bit away from the value of the Open Data. On the contrary: The data is still available in its original form from the publishing organizations, but we add a choice on top of that: A nicer and more user friendly way to access the data for those that are willing to pay for that value-add.
  • Targeted sales: Instead of relying as much on PR and viral distribution as we have with DataMarket.com, we’ll use more direct, traditional sales approaches for this new service.

One of the interesting things about running a technology startup is that the same technology can be turned into so many different products without a single line of additional code. Often the only difference is how you promote it, price it and sell it. This can be both a curse and a blessing, and usually a few things need to be thrown at the wall before you find what sticks. Luck is involved too, but as the famous Norwegian Swedish alpine skier Ingemar Stenmark is quoted saying: “The more I practice, the luckier I get“.

It will be interesting to see if we’ve practiced our data marketing skills enough for the DataMarket – Energy approach to work out.

Written by Hjalmar Gislason

October 12, 2012 at 8:16 pm

Posted in Uncategorized

Best Practices for Publishing Data

with one comment

Slides from a presentation given by Hjalmar Gislason, founder and CEO of DataMarket at Strata Conference in London, October 2012

Written by Hjalmar Gislason

October 2, 2012 at 3:08 pm

Posted in Uncategorized

Worse than a 3D pie chart

with 9 comments

I have seen my share of good charts and I have seen my share of bad charts, but I never expected what I saw today.

As you may know, Hjalli and I are writing a book about chart design. We will guide you through choosing the best chart for your story, and to create beautiful and effective charts. The book will be aimed at those who want, or need, to get a chart out there but aren’t that interested in the why’s. We start by looking at the charts the big boys do by default and go from there, examining the parts and how to improve them.

The first chapter about chart design is about tables, which was fun to write. There was more to say than we expected.

The next chapter focuses on line charts, where I used Numbers, Excel and DataGraph to create default versions of a line chart. As I knew, there were things that could be better designed in the default versions of all applications. None of the defaults is useable in our opinion. Numbers the least.

Today, I dove into the details of my bar chart design. Called up the author of DataGraph to discuss moving axis labels by a pixel. Stared at my screen for half an hour, wondering if I want to keep the x axis on the bar chart or not. Then I opened up Numbers and Excel to create the defaults. DataGraph was already open, since I can almost do perfect charts in it already. So I started with the DataGraph default. It disappointed me a bit. The y axis didn’t automatically label my bars. Other than that, it was not pretty but useable. Numbers, to be fair, does automatically label the bars.

Next up was Numbers. At first glance it looked fine, the color of the bars was okay and the bars were labeled correctly. As I checked off items in the designing-a-bar-chart list in my head, everything seemed fine. Until it didn’t. At all.

In disbelief, I went straight to Excel to see if this alarm went off there as well. It did. And there was much wailing and gnashing of teeth. I felt like George Taylor in the Planet of the apes: “You Maniacs! You blew it up! Ah, damn you! God damn you all to hell!”

The default bar charts from Apple Numbers and Microsoft Excel.

They didn’t include the zero on the x axis! This is no small omission. Their default bar chart is a lie! When comparing the bars, you must compare the full length of the bars.

You can tell both applications to include the zero, but that should not be needed. Creating a bar or column chart without the zero on the axis shouldn’t even be possible. This is worse than a 3D pie chart.

There, I’ve said it.

Written by Þorri

September 13, 2012 at 10:21 pm

Posted in Uncategorized

The 11 Best Data Quotes

with 11 comments

Ever since before starting DataMarket back in 2008, I’ve been collecting funny, insightful and thought-provoking quotes about data and information. Here is my current list of top 11 favorites:

  • 11. Many have tried to describe the importance of data in industrial, or even agricultural terms

    Data are becoming the new raw material of business

    - Craig Mundie, head of research and strategy, Microsoft

    Data is the new oil!

    - Clive Humby, ANA Senior marketer’s summit, 2006

    Information is the oil of the 21st century, and analytics is the combustion engine,”

    - Peter Sondergaard, senior vice president at Gartner

    Data is the new oil? No: Data is the new soil.

    - David McCandless, TEDGlobal, 2010

  • 10. …others in terms of previous breakthroughs in IT

    Data is the Next Intel Inside

    - Tim O’Reilly, What Is Web 2.0

  • 9. The love of data visualization is not new

    There is a magic in graphs. The profile of a curve reveals in a flash a whole situation — the life history of an epidemic, a panic, or an era of prosperity. The curve informs the mind, awakens the imagination, convinces.

    - Henry D. Hubbard, 1939

  • 8. First we have data…

    It is a capital mistake to theorize before one has data.

    - Sherlock Holmes, A Study in Scarlett (Arthur Conan Doyle)

  • 7. …the rest is built on top

    You can have data without information, but you cannot have information without data.

    - Daniel Keys Moran

  • 6. He may not have always played by the book, but he knew what was needed to get the job done

    The most valuable commodity I know of is information.

    - Gordon Gekko, Wall Street (1987)

  • 5. A reminder to be careful in your analysis and don’t stretch to get the results you’d like

    Torture the data, and it will confess to anything

    - Ronald Coase, Economics, Nobel Prize Laureate

  • 4. People take good care of data that is important to them

    Data that is loved tends to survive

    - Kurt Bollacker, Data Scientist, Freebase/Infochimps

  • 3. …and – as most good things – it just improves with age

    Data matures like wine, applications like fish

    - Andy Todd ? James Governor (see comment below)

  • 2. What use are statistics any way?

    In times like these when unemployment rates are up to 13%, income has fallen by 5% and suicide rates are climbing I get so angry that the government is wasting money on things like collection of statistics!

    - From Hans Rosling’s The Joy of Stats

  • 1. Finally, my very favorite data quote (and principle in life)

    If we have data, let’s look at data. If all we have are opinions, let’s go with mine.

    - Jim Barksdale, former Netscape CEO

Additional submissions welcomed in comments below. What are your favorite data quotes?

Written by Hjalmar Gislason

July 8, 2012 at 6:54 pm

Posted in Uncategorized

Tim Berners-Lee’s missing star

with 4 comments

Most of you Open Data enthusiasts out there will be familiar with Tim Berners-Lee’s five star system, a no nonsense rating system for the usefulness and utility of a openly released data set:

1 star for releasing data at all (even PDF of scanned paper)
2 stars for releasing it in structured, machine-readable formats (e.g. Excel file)
3 stars for releasing it using non-proprietary file formats (e.g. CSV file)
4 stars for releasing it as linked open data
5 stars for linking the data to other linked data sources

For those not up to speed, here’s Sir Tim explaining in a short video (first 2 minutes will do it for this purpose):

As stated in Matt’s earlier post in praise of CSV, we firmly believe that the biggest bang for the buck comes from reaching 3 stars fast and then aiming for the fourth and the fifth star as a part of your organizations’ long term data platform strategy.

However, there is a missing star in Tim’s grading system. Releasing your data in CSV or other structured, machine-readable, non-proprietary format is certainly worthy of three stars, but if you are releasing dozens, hundreds or even thousands of data sets, you should also aim to do so in a consistent, well-documented manner across all your data sets.

Why? Because a developer or a data scientist hacking away at your data should not have to determine the structure of each data set individually. They’ll want to be able to write a generic piece of code that slurps up any (or all) of your data sets in the same way. If you have a 100 different data sets, structured in a 100 slightly different ways, it will take them almost a 100 times longer to make use of all your valuable data.

The same goes for the discoverability of the available data. Provide proper, machine-readable directories. And for associations with meta-data, whether in the data file or provided in separate files with a clear association (see Matt’s post for details).

Oh, and you want to avoid the files to be prepared by hand, even final touch-ups. It will lead to mistakes. If you do, make sure you write tests that check for your consistent structure and other possible errors before publishing a data set.

So, that said, here’s our revised version of Tim Berners-Lee’s 5 star system:

1 star for releasing data at all (even PDF of scanned paper)
2 stars for releasing it in structured, machine-readable formats (e.g. Excel file)
3 stars for releasing it using non-proprietary file formats (e.g. CSV file)
3.5 stars for using consistent format, discoverability methods and meta-data associations across all your data sets
4 stars for releasing it as linked open data
5 stars for linking the data to other linked data sources

In your early open data initiatives, aim for at least 3.5 stars!

Written by Hjalmar Gislason

May 25, 2012 at 1:06 pm

Posted in Uncategorized

Why Open Data is all about Apps, and why it shouldn’t be!

with 8 comments

Open data initiatives rock. In fact, without the trend of government and international organizations releasing their data under open licenses, DataMarket.com wouldn’t be so incredibly interesting. So obviously we love them!

Yet, I have something of a grudge against the emphasis on very specialized apps in Open Data initiatives. “Apps for this”, “Apps for that”, competitions, cute little prices, etc., etc.

Now don’t get me wrong, many of these apps are great, but they only release a tiny fraction of the value in all the data that has been opened up. That’s certainly true of each single app, but it’s also true of them in aggregate. Here’s why.

Most, if not all of the data that has been opened up, has been published in a format that is relatively accessible to developers and other data savvy people, but not so much for consumption by mere mortals. Therefore, in order for a successful app to emerge, three things have to come together as depicted in this Venn diagram:

  • Data has to be available
  • There must be obvious user demand
  • And as developers are the “keymakers” to all this data, there must be some developer incentive, be that money, coolness, recognition by peers or all of the above

Looking at the examples of successful open data apps out there, this pattern becomes quite obvious. Take the plethora of city data that has been opened up over the last 3 years or so. An overwhelming majority of the successful apps created on top of this data are transportation apps. All three elements are there. The data, the obvious need by millions of people and the developer incentive to: scratch their own itch, create awesomeness and make money (roughly in order of priority). And these apps are cool, I use some of them almost every day!

However, because of the three requirements mentioned earlier, there is such a great portion of the data that has been opened that is still just lying around unused. Again, Venn:

I’ll use another example of city data to explain: Sewage data. The data is there, but the demand may not be obvious, and there’s nothing sexy about making this data more accessible. I mean: “Who loves sewage information?”

Tell you what. I’m sure that if this data could be made better available to:

  • …construction workers to prevent pipe cuts
  • …environmentalists and policy makers to improve the regulatory environment
  • …advertisers to calculate the percentage of the half-game audience that missed their ad when taking a leak

…the overall social and economic benefits of better access to sewage data alone could be quite dramatic. And that’s just one example.

So, what I’m getting at is this: When thinking about Open Data initiatives, think beyond the apps. Think not only about the high publicity use cases that are a worth a few dollars to millions of users. Think also about the less sexy cases that can help a few people save us millions of dollars in aggregate, generate new insights and improve decision making on various levels.

Think about how you can encourage data portals making the bulk of your data accessible to mere mortals, not only to developers. Think how you can get existing software vendors to integrate your data, and how you can make business users and other decision makers aware that this data indeed exists.

There could be more to Open Data than a bunch of cool consumer apps.

Written by Hjalmar Gislason

May 22, 2012 at 5:40 am

Posted in Uncategorized

In praise of CSV

with 4 comments

Is there such a thing as the perfect data format? No, of course not, but does anything come close? Yes. Trusty old comma-separated values, or CSV.

CSV gets a lot of flak and I think it’s due a little TLC. It doesn’t excite anyone, it’s unfashionable, and it’s old technology — these are all good things for a data format, where you don’t want fast-changing fads to get in the way of data communication. Yes it has its blemishes, but who doesn’t? It’s an excellent fit for statistical data, so do away with the trouble of finding that perfect format and demand CSV for six supremely practical reasons:

  1. CSV isn’t proprietary. CSV has existed for decades and no-one owns the format. You needn’t worry about paying to use it or buying proprietary software to open and save it. Every spreadsheet application supports it and since CSV is open and unchanging, every spreadsheet application will continue to support it for a long time.
  2. Excel supports CSV. Whether we like it or not much of the data that comes from governments, statistics agencies, and companies is stored in Excel spreadsheets, and while these are theoretically machine-readable they tend towards an ambiguity and complexity that’s difficult for computer programs to understand. The older and more widely-used Excel formats are proprietary (newer versions aim to change that but haven’t been entirely successful) and contain bugs, macros and formulas abound, pie charts are embedded all over the place, and the data hierarchies created by its users (I include myself here) can often be ambiguous and hard for a computer program to comprehend. Many of these problems are solved by saving a spreadsheet to CSV, and either you or your source can convert an Excel spreadsheet to a CSV with a few clicks click of a mouse button.
  3. CSV and non-technical people are friends. You’re not likely to be able to demand that data is provided in a particular format, and you’re even less likely to be able to demand that wonderful format you’ve invented. You’ll be lucky to get Excel documents. So asking for CSV is a good bet and risk-free. People can understand it and non-technical staff can make it for you.
  4. CSV is tabular data. If you want to keep the data permanently, or if you’re going to do any serious data manipulation, you’re almost certainly going to put it in a relational database. CSV is very well suited for this because its structure is identical to a database table. It won’t be in third normal form, but it will be easy to convert it into third normal form, and it’s easy to (programmatically) pivot if you need to.
  5. CSV is incredibly easy to parse. CSV is unusual in that no formal specification exists for the format but that doesn’t mean you’ll have difficulty parsing it with a computer program. The closest thing to a spec is RFC 4180; its definition of the format runs to seven bullet points and just over 300 words. And you’ll be hard-pressed to find a programming language that doesn’t come with a CSV parser built in.
  6. Tim Berners-Lee likes it. “Save the best for last”, as the saying goes, and this one’s a corker. Tim Berners-Lee, the man who invented the Web, has a five-star system for open data, and using CSV immediately gets you three stars: by making your data “available as machine-readable structured data […] plus non-proprietary format (e.g. CSV instead of Excel)”. Getting the fourth and fifth stars is more difficult (it involves a lot more theoretical heavy-lifting) but getting three stars from Tim Berners-Lee can only be a good thing.

CSV isn’t perfect, and the most obvious downsides are it’s lack of support for metadata and character-encoding. If you want metadata for your CSV you’ll either need to store it elsewhere — probably on a publicly-accessible server — or squeeze it into the data file itself in an ugly fashion.

The first idea is great if done correctly. To paraphrase Tim Berners-Lee, if you generate a small, separate, metadata file for each datafile the results can be harvested and, like the data itself, distributed and harvested as linked data. Any open dataset can be registered at thedatahub.org, data.gov.uk, and data.gov, among others.

But what’s more likely is that the metadata will be dumped at either the beginning or the end of the CSV file as if it were a second embedded set of CSV keys and values, and it will cause you some minor trouble.

There’s also the perennial problem of character-encoding. A CSV file has no in-built way to describe what character-encoding it uses, so you’re out of luck unless it’s been downloaded from a server that sends a Content-Type header — and even that shouldn’t be trusted. Instead, resign yourself to asking for a particular character-encoding and cushioning yourself with a heuristic.

But don’t let those two minor issues put you off: as Winston Churchill was once overheard saying, CSV really is the least worst data format. It provides a format that is both programmatically easy to read and simple for non-technical people to manage. It might not be perfect but it comes as close as is practically possible.

Written by Matt Riggott

April 17, 2012 at 10:24 am

Posted in Uncategorized

Tagged with , ,

Choosing the right visualization tool for your task

with 9 comments

We’re frequently asked: What is the best tool to visualize data?

There is obviously no single answer to that question. It depends on the task at hand, and what you want to achieve.

Here’s an attempt to categorize these tasks and point to some of the tools we’ve found to be useful to complete them:

The right tool for the task

Simple one-off charts

The most common tool for simple charting is clearly Excel. It is possible to make near-perfect charts of most chart types using Excel – if you know what you’re doing. Many Excel defaults are sub-optimal, some of the chart types they offer are simply for show and have no practical application. 3D cone shaped “bars” anyone? And Excel makes no attempt at guiding a novice user to the best chart for what she wants to achieve. Here are three alternatives we’ve found useful:

  • Tableau is fast becoming the number one tool for many data visualization professionals. It’s client software (Windows only) that’s available for $999 and gives you a user-friendly way to create well crafted visualizations on top of data that can be imported from all of the most common data file formats. Common charting in Tableau is straight-forward, while some of the more advanced functionality may be less so. Then again, Tableau enables you to create pretty elaborate interactive data applications that can be published online and work on all common browser types, including tablets and mobile handsets. For the non-programmer that sees data visualization as an important part of his job, Tableau is probably the tool for you.
  • DataGraph is a little-known tool that deserves a lot more attention. A very different beast, DataGraph is a Mac-only application ($90 on the AppStore) originally designed to create proper charts for scientific publications, but has become a powerful tool to create a wide variety of charts for any occasion. Nothing we’ve tested comes close to DataGraph when creating crystal-clear, beautiful charts that are also done “right” as far as most of the information visualization literature is concerned. The workflow and interface may take a while to get the grips of, and some of the more advanced functionality may lie hidden even from an avid user for months of usage, but a wide range of samples, aggressive development and an active user community make DataGraph a really interesting solution for professional charting. If you are looking for a tool to create beautiful, yet easy to understand, static charts DataGraph may be your tool of choice. And if your medium is print, DataGraph outshines any other application on the market.
    • The best way to see samples of DataGraph’s capabilities is to download the free trial and browse the samples/templates on the application’s startup screen.
  • R is an open-source programing environment for statistical computing and graphics. A super powerful tool, R takes some programming skills to even get started, but is becoming a standard tool for any self-respecting “data scientist”. An interpreted, command line controlled environment, R does a lot more than graphics as it enables all sorts of crunching and statistical computing, even with enormous data sets. In fact we’d say that the graphics are indeed a little bit of a weak spot of R. Not to complain about the data presentation from the information visualization standpoint, most of the charts that R creates would not be considered refined and therefore needs polishing in other software such as Adobe Illustrator to be ready for publication. Not to be missed if working with R is the ggplot2 package that helps overcome some of the thornier of making charts and graphs for R look proper. If you can program, and need a powerful tool to do graphical analysis, R is your tool, but be prepared to spend significant time to make your outcome look good enough for publication, either in R or by exporting the graphics to another piece of software for touch-up.
    • The R Graphical Manual holds an enormous collection of browsable samples of graphics created using R – and the code and data used to make a lot of them.

Videos and custom high-resolution graphics

If you are creating data visualization videos or high-resolution data graphics, Processing is your tool. Processing is an open source integrated development environment (IDE) that uses a simplified version of Java as its programming language and is especially geared towards developing visual applications.

Processing is great for rapid development of custom data visualization applications that can either be run directly from the IDE, compiled into stand-alone applications or published as Java Applets for publishing on the web.

Java Applets are less than optimal for web publication (ok, they simply suck for a variety of reasons), but a complementary open-source project – Processing.js – has ported Processing to JavaScript using the canvas element for rendering the visuals (canvas is a way to render and control bitmap rendering in modern web browsers using JavaScript). This is a far superior way to take processing work online, and strongly recommended in favor to the Applet.

The area where we have found that Processing really shines as a data visualization tool, is in creating videos. It comes with a video class called MovieMaker that allows you to compose videos programmatically, frame-by-frame. Each frame may well require some serious crunching and take a long time to calculate before it is appended to a growing video file. The results can be quite stunning. Many of the best known data visualization videos are made using this method, including:

Many other great examples showing the power of Processing – and for a lot more than just videos – can be found in Processing.org’s Exhibition Archives.

As can be seen from these examples Processing is obviously also great for rendering static, high-resolution bitmap visualizations.

So if data driven videos, or high-resolution graphics are your thing, and you’re not afraid of programming, we recommend Processing.

Charts for the Web

There are plenty – dozens, if not hundreds – of programming libraries that allow you to add charts to your web sites. Frankly, most of them are sh*t. Some of the more flashy ones use Flash or even Silverlight for their graphics, and there are strong reasons for not depending on browser plugins for delivering your graphics.

We believe we have tested most of the libraries out there, and there are only two we feel comfortable recommending, each has its pros and cons depending on what you are looking for:

  • Highcharts is a JavaScript charting library that renders vector based, interactive charts in SVG (or VML for older versions of Internet Explorer). It is free for non-commercial use, and commercial licenses start at $80. It is a flexible and well designed library that includes all the most common chart types with plenty of customization and interactivity options. Interestingly enough even though Highcharts is a commercial solution, the source code is available to developers that want to make their own modifications or additions. With plenty of examples, good documentation and active user forums, Highcharts is a great choice for most development projects that need charting.
  • gRaphaël is another JavaScript charting library built on top of Raphaël (see below). Like HighCharts, gRaphaël renders SVG graphics on modern browsers, falling back to VML for IE <9. While holding a lot of promise, gRaphaël is not a very mature library and with limited capabilities, few chart types, even fewer examples and pretty much non-existent documentation. It is however available under proper open source licenses and could serve as a base for great things for those that want to extend these humble beginnings.

Other libraries and solutions that may be worth checking out are the popular commercial solution amCharts, Google’s hosted Chart Tools and jQuery library Flot.

Special Requirements and Custom Visualizations

If you want full control of the look, feel and interactivity of your charts, or if you want to create a custom data visualization for the web from scratch, the out-of-the box libraries mentioned above will not suffice.

In fact – you’ll be surprised how soon you run into limitations that will force you to compromise on your design. Seemingly simple preferences such as “I don’t want drop shadows on the lines in my line chart”, or “I want to control what happens when a user clicks the X-axis” and you may already be stretching it with your chosen library. But consider yourself warned: The compromises may well be worth it. You may not have the time and resources to spend diving deeper, let alone writing yet-another-charting-tool™

However, if you are not one to compromise on your standards, or if you want to take it up a notch and follow the lead of some of the wonderful and engaging data journalism happening at the likes of the NY Times and The Guardian, you’re looking for something that a charting library is simply not designed to do.

The tool for you will probably be one of the following:

  • Raphaël, gRaphaël’s (see above) big brother. Raphaël is a powerful JavaScript library to work with vector graphics. It renders SVG graphics for modern browsers and falls back to VML for Internet Explorer 6, 7 and 8. It comes with a range of good looking samples and decent documentation. Raphaël is open source, and any developer should be able to hit the ground running with it to develop nice looking things quite fast. We don’t recommend Raphaël for the advanced charting part, but for entirely custom data visualizations or small data apps it may very well be the right tool for the task.
  • Protovis is an open source JavaScript visualization toolkit. Rather than simply controlling at a low level the lines and areas that are to be drawn, Protovis allows the developer to specify how data should be encoded in marks – such as bars, dots and lines – to represent it. This approach allows inheritance and scales that enable a developer to construct custom charts types and layouts that can easily take in new data without the need to write any additional code. Protovis natively uses SVG to render graphics, but a couple of efforts have been made to enable VML rendering making Protovis an option for older versions of Internet Explorer that still account for a significant proportion of traffic on the web.

    Protovis is originally written by Mike Bostock (now data scientist at Square) and Jeffrey Heer of the Stanford Visualization Group. Their architectural approach is ingenious, but it also takes a bit of an effort to wrap your head around, so be prepared for somewhat of a learning curve. Luckily there are plenty of complete and well-written examples and decent documentation. Once you get going, you will be amazed at the flexibility and power that the Protovis approach provides.

  • D3.js or “D3″ for short is in many ways the successor of Protovis. In fact Protovis is no longer under active development by the original team due to the fact that its primary developer – Mike Bostock – is now working on D3 instead.

    D3 builds on many of the concepts of Protovis. The main difference is that instead of having an intermediate representation that separates the rendering of the SVG (or HTML) from the programming interface, D3 binds the data directly to the DOM representation. If you don’t understand what that means – don’t worry, you don’t have to. But it has a couple of consequences that may or may not make D3 more attractive for your needs.

    The first one is that it – almost without exception – makes rendering faster and thereby animations and smooth transitions from one state to another more feasible. The second is that it will only work on browsers that support SVG so that you will be leaving Internet Explorer 7 and 8 users behind – and due to the deep DOM integration, enabling VML rendering for D3 is a far bigger task than for Protovis – and one that nobody has embarked on yet.

After thorough research of the available options, we chose Protovis as the base for building out DataMarket’s visualization capabilities with an eye on D3 as our future solution when modern browsers finally saturate the market. We see that horizon about 2 years from now.

Written by Hjalmar Gislason

April 4, 2012 at 2:50 am

Posted in Uncategorized

Effective Data Visualization: Presentation at Strata

leave a comment »

Here are the slides from Hjalmar Gislason’s presentation at Strata in Santa Clara on Feb 29th, 2012

Note that most images are links to further information such as demonstrations, libraries, blog posts, etc.

Written by Hjalmar Gislason

February 29, 2012 at 7:51 pm

Posted in Uncategorized

A Big Launch Coming Up at Strata: Data Publishing Solutions!

with 2 comments

ImageI know, I know. A blog is neither the place nor the medium for press releases. I’m just too excited about this to not publish here what I’ve just pushed out through the wires!

Note the reference to a new major customer (and more to come) and all the exciting new things that are hinted at. I’ll give you a few additional keywords:

  • Data Uploads
  • Topic pages (build your own automated dashboards and reports)
  • Group sharing and private content
  • More end-user functionality for free
  • World-leading premium data providers
  • …and more

This is by far our biggest upgrade since the launch of the international data offering last year. We will publish in-depth descriptions and examples here as we launch next week.

So, here it goes. Please share this with your media contacts and friends:

DataMarket Announcing Data Publishing Solutions for Research Companies at Strata
BOSTON — February 21, 2012

DataMarket, the company behind the leading data portal DataMarket.com, is launching a range of data publishing solutions for research companies, analysts and data enthusiasts at O’Reilly’s Strata Conference next week.

These solutions allow customers to easily publish their data sets and collections and make them available for users to search, visualize, compare and download, either for free or for a fee.

Ranging from simple uploads of data sets for private use on DataMarket.com to full rebranding of DataMarket’s system to run on top of customers’ databases as an integrated part of their web site, these new solutions open exciting possibilities to data providers of all sizes.

“We’re excited about DataMarket’s Enterprise solution as a new interactive and visual tool to analyze our research data”, says Carl Howe VP of Data Sciences Research at Yankee Group, one of several information companies already underway implementing DataMarket’s solutions as a part of their research and data publishing process. “We believe that tools such DataMarket’s will democratize access to the “big data” driving today’s mobile ecosystem today, so we’re excited to be working together to bring that capability to our analysts and users.”

DataMarket’s new data publishing solutions will be launched and immediately available to new and existing customers on February 29th. Details on functionality and pricing will be announced at the launch.

- – -

DataMarket helps business users find and understand data, and data providers efficiently publish and monetize their data and reach new audiences.

DataMarket’s unique data portal – DataMarket.com (http://DataMarket.com/) – provides visual access to billions of facts and figures from a wide range of public and private data providers including the United Nations, the World Bank, Eurostat and the Economist Intelligence Unit.

For further information contact:

Hjalmar Gislason, founder and CEO
hg@datamarket.com

Written by Hjalmar Gislason

February 21, 2012 at 9:39 pm

Posted in Uncategorized

Follow

Get every new post delivered to your Inbox.

Join 55 other followers