A Big Data Course

The published diagram is to detail a setup of a Big Data Course.

  1. Fundamentals on databases (SQL and NoSQL), statistics (the R framework) and graph databases
  2. The focus is on the Hadoop eco-system and it’s programming paradigm, MapReduce
  3. MapReduce is available to be used with easier to master high level query languages like Pig and Hive
  4. While Hadoop is for batch processing there are other usage areas:
    • Real-time data access by HBase NoSQL daemon
    • Fast but lower data volume processor, Spark
    • Machine learning framework that can be run on top of Hadoop: Mahout
  5. To be able to use these tools a well built, secure cluster is to be planned and developed, then operated securly
  6. After the data analysis is done, final steps of visualization are detailed – to make an impact by using the achieved analytic results

A derivative of the course is held at the University of Szeged.

Big Data Course