You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduction to Map-Reduce -
Introduction to Hadoop, Map Reduce, Pipelining, Cascading, Pig and Hive.
Chapter presents benefits of higher level abstractions of Map Reduce (concepts and capabilities).
Get ready for Scalding -
Theory about Scalding - the Scala Domain Specific Language utilising Cascading.
Development environment setup including local hadoop cluster for development.
Execute the first Hello World Scalding example.
Scalding by example -
The core capabilities of scalding: i) Map-like functions, ii) Grouping/reducing functions iii) Join operations
Intermediate examples -
A Scalding log processing flow for a News company, aggregating multiple sources will be presented.
Through an example with multiple pipe-lines some more advanced concepts are presented.
Scalding Design Patterns -
Interesting design patterns applicable to Scalding data processing applications. Using the 'External Operations' patters will enable us performing unit testing and structuring our applications in a modular way.
Testing & TDD -
Best practices of first defining behaviour (Behaviour Driven Development) then tests (Test Driven Development) and then completing the implementation. How to write unit, integration tests and also apply Black-box testing methodologies in the context of Big Data.
Running Scalding in Production -
Tips and tricks on how to execute and schedule jobs. Also how to co-ordinate the execution of Scalding/Scala/Java and even external system processes. Finally how to configure Scalding jobs using property files or Hadoop parameters, how to monitor and optimize jobs and other usefull tips.
Using external data stores -
Interaction with external external SQL, NOSQL and in-memory applications like HBase, SQL, ElasticSearch etc.
Matrix Calculations and Machine Learning -
Matrix calculations using the Matrix API and algebird to calculate text similarity (TF-IDF)
and set similarity (Jaccard). Then another example on Mahout K-Means clustering and outlier detection.