A week ago we silently released a new, exciting feature to DataMarket: Choropleth maps.
Choropleth maps are geographical maps where areas, such as countries or states, are colored based on the value of an indicator. They are a great way to explore and reveal geographical patterns in data.
You’ve seen choropleth maps a million times, but may not have known what they’re called. Neither has creating one ever been this simple. Find the data you want to view on DataMarket.com (or upload your own), and select the Choropleth map chart type. That’s it.
Here’s an example showing GDP per capita across the globe:
And you can zoom into different regions of the world, such as Europe:
There are even two sub-national maps available already, one for the regions of Brazil and the other one for the states of the United States of America. Here’s one that shows the latest unemployment numbers in the US, by state:
Implementation details – for those that are interested
As with everything we do at DataMarket, we have to approach things in a very generic way. There are almost 70 thousand data sets already available on our public site alone, holding more than 310 million time series. Furthermore, users can upload their own data. So every chart, export format and feature of the system has to be able to act in a generic way to accomodate for a wide range of values, different sizes of data sets and all teh weird edge-cases. This is very different from creating a single choropleth by hand, or hacking a single map for use with a single data set.
Choropleths are no exception. We believe we’ve done quite well, but we had to make all sorts of design choices when implementing these. We will be covering these separately in a follow-up post in the coming days.
DataMarket aggregates and normalizes quantitative data from a wide variety of sources, enabling users to “go from question to shared insight in minutes”.
To do so, DataMarket guides users through a specific workflow. An analyst has come up with a question – say, “Who are the world’s biggest oil producers and where is the US in that regard?” – and is seeking data that can provide an answer or serve as an input into a data model or decision making process.
The workflow to answer that question is the following:
- Search: Keyword queries or navigation that helps users to identify data sets that may be relevant to the question. The keyword query here might be: [US oil production]. This will return a list of data sets that the user can scan and select one that is likely to have an answer, e.g. “Oil: Production tonnes” from BP.
- Select: Selecting the data from the data set that may shed light on the question. When first opening a data set, this is done automatically based on information in the user’s search query (if such hints are available) or the properties of the data itself, favouring e.g. world totals over values for individual countries, etc. In some cases no such hints or properties are available, in which case the user is guided through the selection process. In our example above, the system will automatically select the US based on that the term was used in the search query, immediately resulting in a line chart showing the history of oil production in the US from 1965-2012.
However, as our question was about the world’s largest oil producers, we select all of the countries by checking the box next to the title of the country “dimension”. This will result in a somewhat incomprehensible line chart with more than 60 lines!
- Display: This is where the user decides how to display the selected data. In this case we want to see a ranked comparison of oil production in different countries in the latest available year. The chart type that yields that view is the bar chart. Selecting that, we see that there are some totals in this data set that “pollute” the list, so we go back to the select step to remove these. This leaves us with a chart showing the long tail of oil production in the world.
We’ve arrived at our answer: In 2012, Saudi Arabia was the world’s biggest oil producer, followed closely by Russia with the US in a distant (but somewhat secure) 3rd position.
- Export: Now all we need to do is share our finding with our colleagues, customers or take the data elsewhere for further work or analysis. The Export tab allows the user to export the data or the resulting chart in a number of different ways (Excel, CSV, PowerPoint, PNG, SVG, PDF or straight to PowerPoint), connect to the data from other systems using live data feeds (Excel, R) or our generic API or share a link to the data in email, IM or on social media. Let’s assume that the user simply wants to share her new knowledge with a co-worker. Click the “Short URL” option under “Share”, copy the snappy little data.is link and send it off via email.
Clicking the link will take the recipient to exactly the same view. And the user can continue whatever she was doing when the question came up.
In the above example the answer to the question was found in a time series data set, the most common type of research documents found in DataMarket, but the same workflow also applies to other types such as survey questions and text documents, although the available functionality varies at each step given the nature of the data/document.
The example also relies on the fact that the data to answer the original question is indeed available in our data collection. The more than 50,000 data sets that are already there hold more than 3 billion numerical facts about the world, so there is a lot in there. But in short we’re strong when it comes to macro-economic data, and spotty when it becomes to more industry-specific or sub-national data. This is where our Data Hub product comes in, allowing you to use the DataMarket with any data you want, whether from public sources, syndicated premium research or private data from your computer or corporate network. Read more about the Data Hub here.
There are various other aspects of the system that support the workflow and common data tasks around it. Most importantly:
- Combining data: The system allows data from two or more data sets to be combined in a single view, regardless of the original format or source of the data. As an example one could compare the historical growth of oil production in Saudi Arabia to their GDP growth.
- Collecting data: Multiple data views (charts, tables, maps, …) – typically on the same topic or project – can be gathered on a single page, called a topic page, for quick reference. Here’s one on food and agriculture as an example. This can serve several purposes:
- Bookmarking for later reference
- Creating a dashboard for quick overview (each data widget updates as new data becomes available)
- Sharing collected insights with a team to facilitate discussion and decision making.
- Embedding: A chart or a table can be embedded on a 3rd party website similar to a YouTube video or a SlideShare slide show.
You can read more about individual features of the system in our product tour.
This weekend we rolled out the largest upgrade of our data viewer user interface since we introduced the HTML5 charts in 2011. As this interface is in many ways the heart of the DataMarket experience, we are obviously quite excited to see this go live.
Long-time DataMarket users will notice a series of changes, and new users should now find the site even more user friendly and attractive. Here’s what’s new:
Three tab interface
All the controls that used to be around the chart on the right are now gone, putting more emphasis on the chart itself and leaving the chart area cleaner. The controls are now found under their respective tabs in the panel on the left hand side. Each tab supports a logical step in a workflow once you’ve opened the data set you wanted to work with:
- Select: This is where you select the data you want to view from the open dataset(s). This is in fact what has always been on the left-hand side panel. A notable difference, however, is that there is no “Visualize” button. Instead charts and tables update immediately as the data selection is changed. See more on this below.
- Display: This is where you control the display of the data. Pick your chart types, select the period or point in time you want to view, configure chart settings, edit chart titles and so on. Look for new and exciting chart types here soon!
- Export: This is where you “take it away“: Export the data you have selected or the charts you have created in your desired format (CSV, Excel, PowerPoint, PDF, PNG, SVG, …); connect live to our vast collection of normalized data from other systems (Excel, R, or anywhere else using our generic API); embed interactive versions of the charts in your own blog posts and articles; or share your findings with others on social media or email/instant messaging using the short URLs.
We have been fans of instant feedback for a long time, and we finally got around to do something about it. Every change you make to a data selection or a data view is now instantly reflected on the screen. There are no “Visualize” or “Apply” buttons. Instead your actions have immediate effect on the data you are looking at. We believe this approach makes working with data a lot more natural – almost tactile – encouraging exploration and experimentation and facilitating faster insight.
In the world of data, context is everything, and often you need additional meta-data to fully understand what you are looking at, how the data is collected and what it really means. In fact, two of our principles are to remind us that we must provide all the necessary information and context for our users to fully understand any data view. We therefore go to great lengths to acquire associated meta-data and details for any data we make available 1. Access to this information has always been available under “Detailed information”, a menu item on the “hamburger icon” () next to each data set, but we’ve now given it a lot more weight, putting it right below the chart area, easily expanded by clicking the “Show detailed information” link next to the source reference.
We are quite proud of this upgrade and it provides us with a UI-framework that allows us to logically expand on the data viewer’s functionality, including data transformations, statistical analysis and additional visualization types. Stay tuned for that.
We welcome any questions comments and ideas on the new interface. Feel free to comment here or reach out to email@example.com with your feedback.
- – -
1 Unfortunately many data providers leave users of their data – us included – a little too much in the dark the way they make their meta-data available.
My job has a serious occupational hazard. We work with so much interesting data, holding the keys so many – sometimes untold – stories, that a casual opening of a data set can quickly lead to hours of nerdy investigation; trying to understand what might explain a sudden rise, drop or trend.
Following are some of my favorites. Click the thumbnails for full context and interactive charts:
Potatos and the Irish
Japanese fire horses
We got CO2 too
The “terrible” medicalization of childbirth
Market intelligence data from secondary sources is a key ingredient for businesses to understand their business environment, guiding their strategy work, decision making, planning and monitoring. There are a lot of data sources out there, but what is it that makes data worth paying for?
Through our work in helping information publishers better deliver their data products, and information consumers find and understand the data they need, we have identified five characteristics that make people willing to part with their bucks in exchange for bits:
- Quality: Is the data more accurate, more thorough or more reliable than alternatives?
- Propritary-ness: Are there no alternative sources for the data – or rather for the insights to be had from it? *
- Timeliness: Is this the most frequently or quickly updated source of the data?
- Curation: Does the source make my life easier by aggregating and surfacing all the most important data for my purposes in one place?
- Analysis: Is the data associated with insightful commentary and observations?
Obviously a data source is even more valuable if it has a combination of more than one of these properties. Can you think of additional properties we are forgetting?
We have some exciting news! ProQuest, a leading provider of information solutions to academia and government has chosen our Data Delivery Engine to bring vast collections of statistical data to millions of faculty, staff and students around the world.
The product – named ProQuest International DataSets – is being announced and demonstrated today at the ALA conference in Chicago and will be launched this fall. It brings together not only some of the macro-economic data collections that we have previously made available on DataMarket.com, but also new and exciting sources including some of ProQuest’s own data collections and proprietary databases from Oxford Economics.
This is a major deal for us. Not only in the business sense, but also as it enables us to bring our technology and vision of putting the world’s data at the fingertips of those that need it to an enormous new audience. We see this as a great recognition of what we’ve already built, and a motivation to achieve even more.
That said, this is only the first step in a partnership that is going to last for a long time and bear a lot of new fruit in the months and years to come.
For details, see the official press release.
Business intelligence is too focused on internal data and doesn’t help decision-makers understand their external business environment
When you’re driving a car, you spend most of the time looking out the windshield: What’s ahead? When to turn? Any unexpected obstacles in your way? Every now and then you glance at the dashboard. But that’s just to check how fast you’re going, whether you’ve got enough gas and that the engine isn’t running too hot.
In BI, it seems to be the other way around. BI traditionally focuses on data from internal systems, providing a view on internal operations and past performance. Data on the external business environment and the future belongs to the fragmented world of corporate, strategic or market intelligence with their proprietary databases, pivot tables, PDF reports and custom research delivered in static slide decks.
In other words, decision-makers in the enterprise spend a lot of their time looking at the dashboard — sometimes literally — but only have a blurry and fragmented view of their surroundings: the markets they operate in, the economies they belong to and the demographics they target.
There is a lot of good data out there, from public and proprietary sources alike. Government databases are opening up and contain more valuable information than most people realize. Syndicated research — trackers, forecasts and surveys — is plentiful but hard to find and quickly gain insight from. And data from custom research, whether internal or from research vendors, is usually delivered in static formats. As a result, too much of it ends up sitting on hard drives somewhere with no good way to search, compare or access later – let alone to keep an eye on updates to the underlying data.
This is fundamentally inefficient. It means that decisions aren’t made with reference to the best available data; time is lost digging through piles of static documents; and companies are unable to make the most of the sizable investments they’ve already made in market intelligence.
I believe this is the next frontier for intelligence systems. The task is to provide a window on companies’ external business landscape, expanding BI as it has traditionally been used to encompass the rest of the universe of corporate intelligence.
- – -
Note: This was originally published as a guest article on DailyTekk in May 2013.