Data Sources

The first thing a Data Scientist needs is, guess what: data. Unfortunately, a lot of time is spend in acquiring it, cleaning, and formatting it. But things are getting less restricted, better debugged, standardized and faster. Here is a list of APIs that I like. I will expand it over time.

World Bank

There are separate APIs to access the World Banks Open Data Catalog (here), Climat Data  and Wolrd Bank Finance data (API Sources overview). You can do this for instance in python with this package (see documentation).


You can access the database of Tweets via the REST API. There is a large set of functions, however request rates are quite limited. This page lists some python (and other) packages for handling Twitter requests.

Wikipedia API

You can access the Wikipedia database through the Mediawiki API. There are two python packages that should help: 1 2

The New York Times API

There is a lot of different APIs, the most interesting is the Article Search API, which allows you to find all New York Times articles matching certain keywords in headline, abstract, author, etc. The API’s direct return is a lot of meta data, including a URL to the main article. The number of articles one can access is throttled; downloading more requires some extra work.

Google Maps API

You can query directions, places or street view images. One constraint is that the derived data has to be displayed on top of a google map again. There are a lot of nice tools to do this. There is an extensive reference manual with all the APIs functions. Unfortunately, for the free version the number of request one can send per day is quite limited.

Quandl API provides API access to a quickly growing set of financial and economic data sets. You can make up to 5000 requests per day if you register via email. There are packages to get direct access both from python and R.

United Nations

The UNdata API is still pretty much under development, a temporal solution exists here.




Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>