DataMarket blog

Data, visualization and startup life

Using DataMarket from within R

with 6 comments

This one is for the R users among you — we know there are plenty!

You can pull any data from DataMarket directly into your R session using the rdatamarket package. And it’s so simple! Here’s how.

Quick start

To install the package, execute this in R:

install.packages('rdatamarket')
library(rdatamarket)

Then, from any data view on DataMarket.com — say you’re looking at oil production figures for Angola, Brunei and Egypt — just copy the URL from your browser and paste into a call to dmseries or dmlist:

l <- dmlist("http://datamarket.com/data/set/17tm/#ds=17tm|kqc=17.v.i")

Short URLs (data.is, bit.ly, is.gd, t.co) work too:

l <- dmlist("http://data.is/nyFeP9")

That dmlist function gives you a data.frame object. To get a zoo timeseries object, use dmseries:

plot(dmseries("http://data.is/nyFeP9"))

If you need to go through an HTTP proxy, set it up this way:

dmCurlOptions(proxy="http://outproxy.mycompany.com")

Reading metadata

Get a dataset object (find the ID in a datamarket URL, or just paste in the whole URL if you like):

oil <- dminfo("17tm")
oil <- dminfo("http://datamarket.com/data/set/17tm/#ds=17tm|kqc=17.v.i")
print(oil)

This yields:

Title: "Oil: Production tonnes"
Provider: "BP"
Dimensions:
  "Country" (60 values):
    "Algeria"
    "Angola"
    "Argentina"
    "Australia"
    "Azerbaijan"
    [...]

See all the values of the Country dimension:

oil$dimensions[[1]]$values

This yields:

 a "Algeria"
17 "Angola"
 d "Argentina"
 z "Australia"
1l "Azerbaijan"
1b "Brazil"
 v "Brunei"
1h "Cameroon"
13 "Canada"
1o "Chad"
[...]

Here’s a dataset with two dimensions (besides time):

p <- dminfo("http://datamarket.com/data/set/12r9/male-population-thousands")
print(p)

Title: "Male population (thousands)"
Provider: "United Nations" (citing "United Nations Population Division")
Dimensions:
  "Country or Area" (229 values):
    "Afghanistan"
    "Africa"
    "Albania"
    "Algeria"
    "Angola"
    [...]
  "Variant" (5 values):
    "Constant-fertility scenario"
    "Estimate variant"
    "High variant"
    "Low variant"
    "Medium variant"

Reading data

From that last dataset, fetch the UN’s population prediction for Sweden and Somalia in the constant-fertility scenario (note the “(thousands)” in the dataset title):

dmseries(p, 'Country or Area'=c("Somalia", "Sweden"),
         Variant="Constant-fertility scenario")

             Somalia   Sweden
2010-07-01  4642.070 4613.551
2015-07-01  5357.233 4725.918
2020-07-01  6211.305 4840.434
2025-07-01  7243.572 4942.865
2030-07-01  8490.929 5021.646
2035-07-01  9990.910 5083.680
2040-07-01 11793.524 5144.685
2045-07-01 13966.319 5211.212
2050-07-01 16597.110 5281.437

The same as a data.frame:

dmlist(p, 'Country or Area'=c("Somalia", "Sweden"),
       Variant="Constant-fertility scenario")

   Country.or.Area                     Variant Year     Value
1          Somalia Constant-fertility scenario 2010  4642.070
2          Somalia Constant-fertility scenario 2015  5357.233
3          Somalia Constant-fertility scenario 2020  6211.305
4          Somalia Constant-fertility scenario 2025  7243.572
5          Somalia Constant-fertility scenario 2030  8490.929
6          Somalia Constant-fertility scenario 2035  9990.910
7          Somalia Constant-fertility scenario 2040 11793.524
8          Somalia Constant-fertility scenario 2045 13966.319
9          Somalia Constant-fertility scenario 2050 16597.110
10          Sweden Constant-fertility scenario 2010  4613.551
11          Sweden Constant-fertility scenario 2015  4725.918
12          Sweden Constant-fertility scenario 2020  4840.434
13          Sweden Constant-fertility scenario 2025  4942.865
14          Sweden Constant-fertility scenario 2030  5021.646
15          Sweden Constant-fertility scenario 2035  5083.680
16          Sweden Constant-fertility scenario 2040  5144.685
17          Sweden Constant-fertility scenario 2045  5211.212
18          Sweden Constant-fertility scenario 2050  5281.437

The above demonstrates dimension filtering; dimensions and their values can be specified by their $id or their $title, to fetch the data filtered to specific values of a dimension. If no filtering is specified, all of the dataset is fetched (careful: some datasets are enormous, and the DataMarket.com API may truncate extremely large responses).

Written by Gunnlaugur Þór Briem

October 31, 2011 at 12:55 am

Posted in Uncategorized

6 Responses

Subscribe to comments with RSS.

  1. [...] the Revolutions blog: The good folks at DataMarket have posted a new tutorial on using the rdatamarket package (covered here in August) to easily download public data sets into R for [...]

  2. Thanks for this. After setting the proxy location as suggested with dmCurlOptions(), I get the error message “Error: Proxy Authentication Required”. This is at least different to what I was getting before, which suggests I’m on the right track… How do I authenticate myself? I note also that setInternet2(TRUE), which works for most of our R packages accessing the internet doesn’t work for rdatamarket – why is this I wonder?

    Peter Ellis

    October 21, 2013 at 12:04 am

  3. Hi Peter,

    That sounds like your proxy server requires you to authenticate against it. With any luck you simply need to add a username and password, either as a separate option:

    dmCurlOptions(proxy=”http://outproxy.mycompany.com”, proxyuserpwd=”myuser:mypass”)

    or directly in the proxy URL:

    dmCurlOptions(proxy=”http://myuser:mypass@outproxy.mycompany.com”)

    … neither of which I’ve tested. If your proxy server is NTLM, you’ll also need proxyauth=”ntlm”.

    Hope that helps!

    Gunnlaugur Þór Briem

    October 21, 2013 at 10:20 am

    • Thanks, that worked a treat. Note, to avoid leaving our passwords in a script file we use a little tcl/tk function to temporarily store it in an R object and paste it into the dmCurlOptions without it being written down.

      Peter Ellis

      October 21, 2013 at 10:33 pm

  4. […] latest data source of financial data accessible from R I came across. A good tutorial can be found here. I updated the table and the descriptions […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 57 other followers

%d bloggers like this: