The first thing a Data Scientist needs is, guess what: data. Unfortunately, a lot of time is spend in acquiring it, cleaning, and formatting it. But things are getting less restricted, better debugged, standardized and faster. Here is a list of APIs that I like. I will expand it over time.
There are separate APIs to access the World Banks Open Data Catalog (here), Climat Data and Wolrd Bank Finance data (API Sources overview). You can do this for instance in python with this package (see documentation).
You can access the database of Tweets via the REST API. There is a large set of functions, however request rates are quite limited. This page lists some python (and other) packages for handling Twitter requests.
The New York Times API
There is a lot of different APIs, the most interesting is the Article Search API, which allows you to find all New York Times articles matching certain keywords in headline, abstract, author, etc. The API’s direct return is a lot of meta data, including a URL to the main article. The number of articles one can access is throttled; downloading more requires some extra work.
Google Maps API
You can query directions, places or street view images. One constraint is that the derived data has to be displayed on top of a google map again. There are a lot of nice tools to do this. There is an extensive reference manual with all the APIs functions. Unfortunately, for the free version the number of request one can send per day is quite limited.
Quandl.com provides API access to a quickly growing set of financial and economic data sets. You can make up to 5000 requests per day if you register via email. There are packages to get direct access both from python and R.