• video_business

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.To make programming faster, Spark provides clean, concise APIs in Scala, Java and Python. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.

Pre-requisites to attend:

  • Basic knowledge of Hadoop, linux, java or Python

Total Course Duration:

  • 40 Hrs

Course Contents:

Day 1

  • Introduction to Spark Fundamentals
  • Architecture & Programming Model
  • Spark Deployment Components
  • Spark Installation and Configuration

Day 2

  • RDD in Detail
  • Spark Execution Models
  • Life Cycle of a Spark Job
  • Hands on Spark Examples
  • Spark Internals (Behind the Hood)

Day 3

  • Caching and Persistence with RDD
  • Spark Task Scheduler

Day 4

  • Spark Streaming
  • Hands on Spark Streaming

Day 5

  • Spark SQL vs Shark vs Hive
  • Concept of SchemaRDD and Different File Format Support (Parquet, JSON, ORC etc.)
  • Hands on with Spark SQL

Day 6

  • MLlib -> Machine Learning with Spark
  • GraphX -> Graph Processing with Spark
  • Hands on with Spark Mllib and GraphX

Day 7 and 8

  • Introduction to Tachyon
  • Hands on and End to End Project on Spark and Eco System


Cared and Crafted by: Velociter

Scroll to Top