Knowledge Sources

Here are a few lists of resources you might find useful if you want to become a data scientist and also if you already are.

Big Data Analysis Courses

IBM’s BigDataUniversity provides many introductory courses on Hadoop, Pig, Jaql, Hive, etc.

Some good courses on Corsea:

Programming

There are tons of programming languishes and good documentation on the web, and I have programmed 15+ years in C++ on Linux, but what I do at the moment is Java, python and R on Mac. Here are some things I find useful:

Spyder a python IDE; For Mac try Anaconda which binds NumPy, SciPy, Matplotlib and Panda (a Data Analysis Library) all nicely together.

ROOT, the open-source, C++ based data analysis framework from CERN, there is also a python interface

Statistics

StatsModels, a statistics package for python

The R Project for Statistical Computing, and Archive of R packages

RooStats, the statistics package extension for ROOT

Machine Learning

scikit-learn for Machine Learning in python

TMVA, the C++ based Machine-Learning extension for ROOT, allows to run a large number of classification algorithms at once and compare them.

 

ShareTwitterGoogle+FacebookEmailLinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>