Here are a few lists of resources you might find useful if you want to become a data scientist and also if you already are.

## Big Data Analysis Courses

IBM’s BigDataUniversity provides many introductory courses on Hadoop, Pig, Jaql, Hive, etc.

Some good courses on Corsea:

- Data Analysis and Statistical Inference
- Computing for Data Analysis
- Machine Learning
- Financial Engineering and Risk Management
- Social and Economic Networks: Models and Analysis
- Probabilistic Graphical Models

## Programming

There are tons of programming languishes and good documentation on the web, and I have programmed 15+ years in C++ on Linux, but what I do at the moment is Java, python and R on Mac. Here are some things I find useful:

Spyder a python IDE; For Mac try Anaconda which binds NumPy, SciPy, Matplotlib and Panda (a Data Analysis Library) all nicely together.

ROOT, the open-source, C++ based data analysis framework from CERN, there is also a python interface

## Statistics

StatsModels, a statistics package for python

The R Project for Statistical Computing, and Archive of R packages

RooStats, the statistics package extension for ROOT

## Machine Learning

scikit-learn for Machine Learning in python

TMVA, the C++ based Machine-Learning extension for ROOT, allows to run a large number of classification algorithms at once and compare them.