The Emerging Field of Data Markets – our competitive landscape
One of the most common questions we’ve gotten since our international launch is “How is DataMarket different from X?”, where X is pretty much any of the companies or solutions mentioned below.
While the obvious answer to that question is: “Try us out and see for yourself!“, this post is an attempt to explain.
What is DataMarket?
DataMarket is a search engine for statistical data. We’re connected to a growing list of statistical data providers such as the UN, the World Bank, Eurostat, Gapminder and others, already holding more than 100 million time series on a variety of topics. Using DataMarket, users can search, visualize, compare and download data from these providers in a single place. We’re working hard to integrate more data sources, including data from premium sources with financial data, market research, research analysis and more. Being able to answer users’ queries with the best available statistics from free and premium data sources alike is a unique proposition that will turn DataMarket into an active marketplace for statistics, connecting data seekers with data providers.
Or, as we wrote in the first draft of the business plan back in early 2008: DataMarket’s mission is to build an active marketplace for structured data and statistics.
That said, here’s a rundown of some of the other players in this emerging field:
Status and background: Timetric is a UK based startup. The company was founded in mid-2008 and received seed funding early 2010 after winning TechCrunch’s London Mini Seed Camp 2009. Participants in that round included former LastFM chairman Stefan Glänzer, Alex Zubillaga, well known Angel Sherry Coutu and Matteo Stefanel, as well as Sean Park and Udayan Goyal of Nauiokas Park.
In their own words: We design and build statistical data services to help people and businesses make better decisions. Our services are built around the Timetric Platform, software we’ve developed for publishing, managing, and enabling social analysis and visualization of very large collections of time-series data. Our services include timetric.com, a leading aggregator of public and governmental economic data, and Timetric Portfolios, a simple and social tool for picking and analysing stock portfolios. – http://www.linkedin.com/company/timetric-ltd
Business idea: Judging from their website, Timetric offers two product lines: A search engine for public statistics with “2,575,883 statistics from sources including The World Bank, Eurostat and the Office for National Statistics, ready to see, share and download.” (Timetric front page Feb 25th, 2011) and Exclusive data, giving “New insight through our exclusive statistics. We mine real businesses’ data to bring you what’s really going on.”, possibly hinting that the company may be going after the Business Intelligence sector with a hosted SaaS solution?
Key people: Andrew Walkingshaw (@covert), CEO/Cofounder; Dan Wilson, Co-founder and Platform Architect; Toby White (@tow21), Co-founder & Front-end Developer; Simon Briscoe (@simonbriscoe), Vice President, Product (formerly Statistics Editor at Financial Times)
Team size: 6 (according to CrunchBase)
Google Public Data
Status and background: Google Public Data is a project in Google Labs. Google Public Data’s history goes back to March 2007, when Google acquired Gapminder’s Trendalizer software. Results from Google Public Data have been turning up for certain types of queries since April 2009 and in March 2010, the company released Google Public Data Explorer, a basic list of the available data sets in Google Public Data.
In their own words: The Google Public Data Explorer makes large datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don’t have to be a data expert to navigate between different views, make your own comparisons, and share your findings. – http://www.google.com/publicdata/home
Business idea: Google is already using GPD’s data to make Google search results more informative and useful. This should indirectly draw more traffic and thereby more search ad revenues. It remains to be seen if and how Google sees GPD as an independent revenue generator. The technology could be used as a statistics-publishing platform for large organizations, either for free (giving Google more interesting data to work with) or for a fee (a sort of a hosted BI / data publishing solution). The latter is unlikely as it is very different from other business models Google uses. So is the possibility that Google will start distributing premium data for a commission, taking money from end-users of the data. GPD’s release of upload capabilities and the Dataset Publishing Language (DSPL) last week seems to support the idea that the team sees GPD becoming a publishing platform for statistical data.
Key people: Omar Benjelloun, Benjamin Yolken, Jürgen Schwärzler
Team size: Approx. 10 people
Status and background: Wolfram|Alpha was launched in March 2009. It is a product of Wolfram Research, a private company based in Champaign, Illinois, and led by computer Scientist Stephen Wolfram. The company’s main product is a successful technical computing software suite called Mathematica that has been on the market since 1988. Wolfram|Alpha is built on top of Mathematica, but is already profitable on its own by licensing search results to Microsoft’s Bing and “more deals [that] will be announced soon” (source)
In their own words: Wolfram|Alpha introduces a fundamentally new way to get knowledge and answers— not by searching the web, but by doing dynamic computations based on a vast collection of built-in data, algorithms, and methods. [...] Wolfram|Alpha’s long-term goal is to make all systematic knowledge immediately computable and accessible to everyone. – http://www.wolframalpha.com/about.html
Business idea: I’ll start this with a personal note: Wolfram|Alpha is one of my favorite technologies ever made. It is deeply awesome! Thinking about the raw power needed to parse my query string, search for the data and make the calculations needed for results like this one “was it raining in London when prince william was born?” makes my nerd heart skip a beat Now, Stephen Wolfram’s mission with Wolfram|Alpha is an ambitious one. To make a “computational knowledge engine” that will topple Google as the place people go to search for knowledge. I’m certainly not going to compare our efforts to that, but there is an overlap in the way WolframAlpha tries to answer certain types of user queries with what we’re doing at DataMarket. Currently the business model lies in licensing search results to others (see above), and licensing the API, but the long-term plan remains to be seen. The most likely result is an ad-based model similar to that of Google, but one should keep in mind that Wolfram Research has been successfully selling commercial products for more than 20 years and might have something different in mind.
Key people: Stephen Wolfram, founder and CEO.
Team size: Wolfram Research employs about 500 people. Given that Wolfram|Alpha is built on top of their flagship product, Mathematica, it’s fair to assume that a significant part of that staff is working on W|A in one way or another.
Other Data Markets
Status and background: Infochimps launched at DEMOfall 09 in September 2009. They were the first data market to receive serious attention from the Silicon Valley crowd and are often mentioned by the media when covering the emerging field of data markets. The company has taken in about $1.6M in total funding, most recently $1.2M in autumn 2010. Soon after, InfoChimps acquired Data Marketplace a Y-Combinator player in the same space.
In their own words: Infochimps is a venture-backed Austin startup with the mission to democratize access to structured data. Initially the brainchild of two graduate physics students at the University of Texas, Infochimps is indexing and connecting the world’s data and making it searchable. Our technology allows us to host and distribute that data in a variety of formats, with a focus on making lists, spreadsheets and datasets easy to find and consume. – http://www.infochimps.com/press
Business idea: Infochimps aims to become the search engine for data sets, regardless of content, format or purpose. The initial target audience seems to be the developer community, and for that purpose the company provides API access on top of some of their most interesting, listed data sets. Infochimps does not make any effort to normalize data, nor do they emphasize storing data from different data sets in a structured format in their systems. Many of the listed data sets are downloadable, either for free or for a fee as raw data files, and some of the listed data sites are simply links to the data sets on 3rd party sites. This means that the company can quickly build a sizable data catalog, focusing on becoming “the place” for users to search for data sets they need, but leaving them to figure out how to work with the data themselves. A core part of the business model is to take commission of any data that is sold on the site or after referral.
Team size: Approx. 12-14
Status and background: Founded by Gil Elbaz, known as “the man that invented Google Adsense” – no less! This autumn, Factual took in $25M in first round of funding, following up on previous $2M from a list of big name investors such as Andreessen Horowitz, Bill Gross, Esther Dyson, Index Ventures and more.
In their own words: Factual is a platform where anyone can share and mash open, living data on any subject. For example, you might find a comprehensive directory of restaurants along with dozens of searchable attributes, 15MM+ US business listings, or a list of every video game and their cheat codes. We provide smart tools to help the community build and maintain a trusted source of structured data.
Business idea: Factual seems to be on a similar mission to that of Freebase, to build an enormous collection of structured data with a mix of automated methods, editorial work and crowd sourced commits and edits. The database is already significant in size, with a wide variety of data, even though many of their published examples involving geospatial data of some sort. Their current target group is mainly developers and the company openly talks about plans for having a commercial API soon (http://www.factual.com/pricing). In fact they don’t seem to have any ambitions to become a destination site, describing themselves as a “Switzerland” of data (http://thisweekin.com/thisweekin-startups/this-week-in-startups-106-gil-elbaz-founder-of-factual-com/) – a key difference from Freebase.
Team size: 32 according to their – very meta – employees page
Status and background: Freebase launched publicly in March 2007. Their developers – Metaweb – had earlier taken in $57M in two rounds of serious funding to take on the challenge of creating and seeding an enormous “knowledge platform”, often referred to as “a Wiki for structured data”. Metaweb was acquired by Google in the summer of 2010. Some sources claim that Metaweb had actually burned most of their funding already and that the acquisition by Google was a move to acquire talent, and to some extent technologies that Google could make use of elsewhere, rather than necessarily to run Freebase as a business unit of its own. Freebase, however, remains up and running and there is still active maintenance of a lot of data in the system, so the jury is certainly still out on whether these rumors are true.
In their own words: “Freebase is an open, Creative Commons licensed collection of structured data, and a platform for accessing and manipulating that data via the Freebase API.” – http://wiki.freebase.com/wiki/FAQ
Business idea: Before the Google acquisition there were visible signs of two monetization tactics by Freebase, although neither seemed actively pursued: a) Becoming a destination site for information seekers, and monetizing that traffic with targeted advertising; and b) API subscriptions for heavy usage. A quick look through their wiki and other About-pages now, shows no signs of either.
Key people: Danny Hillis, co-founder of Applied Minds and a legendary thinker and innovator.
Team size: ???
Windows Azure Marketplace DataMarket
Status and background: A Microsoft project, originally announced as Codename “Dallas” at Microsoft’s PDC09 in November 2009. In October 2010, the name was changed to Windows Azure Marketplace DataMarket (“Azure DataMarket” or simply “DataMarket” for short, which can be a little confusing for us, so we refer to them as WAMDM (pronounced WHAM-DAM )
In their own words: “DataMarket is a service that provides a single consistent marketplace and delivery channel for high quality information as cloud services. Content partners who collect data can publish it on DataMarket to increase its discoverability and achieve global reach with high availability. Data from databases, image files, reports and real-time feeds is provided in a consistent manner through internet standards. Users can easily discover, explore, subscribe and consume data from both trusted public domains and from premium commercial providers.” – https://datamarket.azure.com/about
Business idea: The core driving force behind Azure DataMarket is almost certainly to make the Azure platform more attractive to developers: Not only do they get the cloud services and the application environment, but also simple and reliable access to a lot of data that can be used to enrich these applications and enable new ones. However, with integrations to PowerPivot and Excel the team seems to have ambitions that reach beyond that and make WAMDM data available to users, especially power-users on the analytical side, again supporting other Microsoft product lines. My prediction: Expect an integration with Microsoft BI solutions sooner than later.
Team size: 10-14 dedicated to this project?
Status and background: A stealth-mode project being spun out of Canadian web development company Unspace, expected to launch in Q2 2011.
In their own words: “BuzzData, a marketplace for data. Currently under development, they are tackling a huge unsolved problem: helping people find data they need. BuzzData provides tools to collaborate on data in a social manner. They are positioning to be the first destination for people working with data; a commons where datasets of all types can be shared.” – http://strataconf.com/strata2011/public/schedule/detail/17593
Business idea: The company has not released a lot of information about the project. However, from their participation in the Strata conference, and without revealing too much from conversations I had with them there, the core idea is to enable and enhance a conversation around data sets, linking data with targeted comments, discussions, etc. I don’t know what the monetization plan is, but there are definitely a few options. For high profile data sets (think US cables on Wikileaks or Guardian’s MP expense reports) a successful implementation could drive enough traffic to be able to draw advertising revenues. Another option would be to sell these capabilities to online publishers, etc.
Key people: Pete Forde (@peteforde)
Team size: 3-5?
Status and background: A stealth mode project by UK based software company Talis. Early this year, they started a blog, and according to the information published there, they plan a launch – at least a closed beta – in “early 2011″. Talis is actually a 40 year old company with roots in software for libraries. They are the developers of Talis platform, a relatively well known solution for Linked Data storage, and running – among other things – Linked Data efforts for Data.gov.uk and the BBC. As I understand it, Kasabi is an effort to extend that platform to be run as a service (XaaS) for data publishers on the one hand, and developers on the other.
In their own words: “It’s an environment that is designed to bring together data publishers and consumers, and provide them with the tools they need to discover, share, remix and use data.” – http://tbe.taleo.net/NA9/ats/careers/requisition.jsp?org=TALIS&cws=1&rid=29 “At its core Kasabi will be a data marketplace, but it will be much more than that. By leveraging the power of Linked Data and, perhaps more importantly, embracing the concept of shared innovation, we think we have something interesting and unique that will not only be fun to explore, but which we hope will generate real value for the whole ecosystem.” – http://blog.kasabi.com/2011/01/12/is-this-thing-on/
Business idea: Again, not a lot of information out there, but one can safely assume that revenues are expected to come as a commission of any data that is sold through the platform. I would also expect that there will be paid options for those that want to host their data on the platform, especially those that want to make their data available for free.
Team size: 6 dedicated on the team (source) + backup from the rest of Talis
Status and background: Socrata was founded in 2007. The company was originally called Blist and presented itself as a personal online database solution, akin to DabbleDB and Zoho creator. The company took in $6.5M in early 2008 and soon thereafter changed the name to Socrata. Socrata now presents itself primarily as a publishing platform for open data efforts, focusing on customers such as local governments and seems to have been quite successful in that field. Some of the data that is made available by their clients is made available through Socrata Open Data, a publicly available data collection with more than 10,000 data sets.
In their own words: “Socrata is an online community for producers, publishers, and consumers of data. Through a suite of innovative Web services, Socrata provides the world’s most comprehensive platform for social data discovery. Social data discovery is a transformational, participatory process for achieving business and government transparency, promoting civic engagement, improving decision and policy making, and creating new audiences and markets for untapped data assets currently obscured in file formats such as CSV and PDF.” – http://www.socrata.com/company-info/
Business idea: Socrata offers their data publishing platform to local governments and others as a hosted solution, starting at $499 per month. Their pricing plan shows various plans and optional features that are available at additional cost. While this makes Socrata more of a data publishing platform than a data market, the interaction between the platform efforts and Socrata open data makes me curious. If Socrata continues their success in signing up local governments and others to their platform, the interaction between the platform services, and the central repository at opendata.socrata.com could become a really interesting one.
Key people: Kevin Merritt (@kmerritt), founder & CEO; Patrick Behrens, CFO
Team size: 20-30? (guesstimate based on a LinkedIn search)
A couple of start-up companies have already come and gone in this space. Most notably Swivel and Verifiable. Robert Kosara (@eagereyes) has written invaluable post-mortems of both companies, interviewing the founders and getting their reflections on the space in hind-sight, where they failed, what they did right and so on. These pieces are a must read for anybody trying to build a company in the space of data markets, DaaS:
The lessons from these pieces are actually so important, that I almost think it should be made mandatory for every failed start-up to share their stories in a similar way. Kudos Robert and the founders for sharing!
Vertical Data Market Places
There are a number of vertical data market places out there. Companies that may have similar business plans as the ones mentioned above, but focus on a very specific kind – or topic – of data. Examples include Exelate – targeted marketing, AggData – store locations, SimpleGeo – geographical data, and countless others. I believe many of these vertical players may already be quite successful businesses on their own, but leave them out of this analysis as their scope is more limited, often even down to very specialized (yet sometimes lucrative) niche segments.
The Old Dogs
These are the elephants in the room (some would say dinosaurs), often wrongly overlooked when analyzing the emerging field of data markets. The fact of the matter is that data markets are already a huge industry. In statistical or quantitative data alone the turnover counts in billions of dollars annually, and rising. Think about the financial data products of companies like Bloomberg, Reuters, Factset, etc and you’ll know what I mean. These are heavily priced services, often based on massive legacy infrastructure, catering mainly premium and/or proprietary (actually often faux-proprietary) data for the financial industry. In other words, as far from the “2.0 world” as one could possibly imagine.
The time is certainly ripe for some disruptive players in this field, coming in with modern approach, built from the ground up focusing on the Web as a distribution channel, modest pricing and furthermore opening up the data market space by reaching an audience outside the narrowly defined financial sector user base of these services.
There is data out there – free and premium alike – that can help almost any business make better plans and decisions. Connecting these businesses with the data that they need will release phenomenal value. Tapping into just a fraction of that will be a hugely successful business for those that get it right.