Here are a few lists of resources you might find useful if you want to become a data scientist and also if you already are.
Big Data Analysis Courses
IBM’s BigDataUniversity provides many introductory courses on Hadoop, Pig, Jaql, Hive, etc.
Some good courses on Corsea:
- Data Analysis and Statistical Inference
- Computing for Data Analysis
- Machine Learning
- Financial Engineering and Risk Management
- Social and Economic Networks: Models and Analysis
- Probabilistic Graphical Models
There are tons of programming languishes and good documentation on the web, and I have programmed 15+ years in C++ on Linux, but what I do at the moment is Java, python and R on Mac. Here are some things I find useful:
StatsModels, a statistics package for python
RooStats, the statistics package extension for ROOT
scikit-learn for Machine Learning in python
TMVA, the C++ based Machine-Learning extension for ROOT, allows to run a large number of classification algorithms at once and compare them.