• video_business

This Course is a comprehensive Hadoop course addressing the need of Hadoop Administrator, Hadoop Developer and Hadoop Architect. The focus is to turn a non-BigData professional into a Big Data Expert in 8 weeks. The course is full of real life use cases and real production examples. At the end of the course, Administrator will be able to manage the Cluster independently, Developer will be able to develop in Map-Reduce,Hive,Pig,Oozie,Sqoop,Flume and Architect will be able to build Big Data Solution for any use-case. The course will be based on YARN

Pre-requisites to attend:

  • Basic knowledge of linux
  • Java
  • SQL

Total Course Duration:

  • 45 hours

Course Contents:

Introduction to Big Data

  • Introduction to Big Data and Hadoop
  • Big Data Technology Landscape
  • Why Big Data?
  • Difference between Big Data and Traditional BI?
  • Fundamentals about High Scalability.
  • Distributed Systems and Challenges
  • Key Fundamentals for Big Data
  • Big Data Use Cases
  • End to End production use case deployed for Hadoop
  • When to use Hadoop and When not to?

YARN Concepts

  • Introduction to YARN
  • Architecture Difference between MRV1 and YARN
  • Introduction to Resource Manager
  • Node Manager Responsibility
  • Proxy Server
  • Job History Server
  • Running map-reduce programs in YARN

HIVE - Basics

  • Introduction to HIVE
  • Concepts on Meta-store
  • Hive Installation
  • Hive Configuration
  • Basics of Hive
  • What Hive cannot do?
  • When to use and not to use HIVE
  • Hive UDF, UDAF, UDTF
  • Writing custom UDF
  • SerDe and role of SerDe
  • Writing SerDe
  • Advanced Analytical Functions
  • Real Time Query
  • Difference Stinger and Impala?
  • Key Emerging Trends

Hadoop Administration - Monitoring

  • Monitoring
  • Monitoring Hadoop process
  • Hadoop Schedulers
  • FIFO Scheduler
  • Capacity Scheduler
  • Fair Scheduler
  • Difference between Fair and Capacity Schedulers
  • Hands on with Scheduler Configuration
  • Cluster Planning and Sizing
  • Hardware Selection Consideration
  • Sizing
  • Kernel Tuning
  • Network Topology Design
  • Hadoop Filesystem Quota
  • Hands on with Few of Hadoop Tuning configurations
  • Hands on Sizing a 100 TB Cluster


  • Sqoop
  • Difference between Sqoop and Sqoop2
  • What are the various parameters in Export
  • What the various parameters in Import
  • Typical challenges with Sqoop operations
  • How to tune Sqoop performance

Hadoop Security

  • Hadoop Design and Architecture
  • Security
  • Security Design for HDFS
  • Kerberos Fundamentals
  • Setting up KDC
  • Configuring Secured Hadoop Cluster
  • Setting up Multi-realm authentication for Production Deployment
  • Typical product deployment challenges with respect to Hadoop Security
  • Role of HttpFS proxy for corporate firewalls
  • Role of Cloudera Sentry and Knox
  • Common Failures and Problems
  • File system related issues
  • Map-Reduce related issues
  • Maintenance related issues
  • Monitoring related issues


  • Architecture
  • High Scalability with Zookeeper
  • Common Recipes with Zookeeper
  • Leader Election
  • Distributed Transaction Management
  • Node Failure Detections and Cluster Membership management
  • Co-ordination Services
  • Cluster Deployment recipe with Zookeeper
  • Typical challenges with Zookeeper operations

HDFS Fundamentals

  • HDFS Fundamentals
  • Fundamentals behind HDFS Design
  • Key Characteristics of HDFS
  • HDFS Daemons
  • HDFS Commands
  • Anatomy of File Read and Write in HDFS
  • HDFS File System Metadata
  • How replication happens in Hadoop
  • How is replication strategy defined and how network topology can be defined?
  • When to use HDFS and when not to?

Hadoop Administration -- Fundamentals

  • Hadoop Installation and Configuration
  • YARN Installation and Configuration
  • Internals of NameNode Metadata Structure
  • FSImage and Edit Logs
  • Viewing Name Node Metadata and Edit Logs
  • HDFS Name Node Federationr
  • Federation and Block Pool ID
  • Tracing HDFS Blocks
  • Name Node Sizing
  • Memory calculations for HDFS Metadata
  • Selecting the optimal Block Size
  • Secondary Name Node
  • Checkpoint process in details
  • Tracing a Map-Reduce Execution from Admin View
  • Logs and History Viewer

PIG - Basics

  • Introduction to PIG
  • Installation and Configuration
  • Basics of PIG
  • Pig Use Cases
  • Advanced PIG Join Types
  • Advanced PIG Latin Commands
  • PIG Macros and their Limitations
  • Typical Issues with PIG
  • When to use PIG and When not to?

Hadoop Administration - Maintenance

  • Hadoop Maintenance
  • Logging and Audit Trails
  • File system Maintenance
  • Backup and Restore
  • DistCp
  • Balancing
  • Failure Handling
  • Map-Reduce System Maintenance
  • Upgrades
  • Performance Benchmarking and Test
  • Hadoop Cluster Monitoring
  • Installation of Nagios and Ganglia
  • Configuring Nagios and Ganglia
  • Collecting Hadoop Metrics
  • REST interface for metrics collection
  • JMX JSON Servlet
  • Cluster Health Monitoring
  • Configuring Alerts for Clusters
  • Overall Cluster Health Monitoring
  • Introduction to Cloudera Manager


  • Introduction and Architecture
  • Installation and Configurations
  • Oozie Workflows
  • Running workflows in Oozie with HIVE, Map-Reduce, PIG, Sqoop
  • Coordinator Jobs
  • Bundle Jobs
  • Difference patterns in Oozie Scheduling
  • How to troubleshoot in Oozie
  • How to handle different libraries in Oozie
  • Hands on example with Oozie


  • Architecture and Fundamentals
  • Installing and Configuring HUE
  • Executing PIG, HIVE, Map-Reduce through HUE using Oozie
  • Various features of HUE
  • Integration of HUE users with Enterprise Identity Management systems


  • End to End POC Design
  • Live Example of end to end POC which has all ecosystem components

MapReduce Fundamentals

  • What is Map-Reduce
  • Examples of Map-Reduce Programs
  • How to think in Map-Reduce
  • What is feasible in Map-Reduce and What is not?
  • End to End flow of Map-Reduce in Hadoop

Hadoop Administration -- Advanced Configurations

  • Hadoop Advanced Configurations
  • High Availability of Name Node
  • Hadoop Security
  • NameNode Safemode
  • Distcp commands in Hadoop
  • File Formats in Hadoop (RC, ORC, Sequence File, AVRO etc)

Hadoop Ecosystem Components

  • Hadoop Ecosystem Components
  • Role of each ecosystem components
  • How does it all fit together


  • HUE
  • Introduction
  • HUE Installation and Configuration
  • Using HUE
  • Zookeeper
  • Introduction
  • Installation and Configurations
  • Examples in Zookeeper
  • Sqoop
  • Introduction to Sqoop
  • Installation and Configuration
  • Examples for Sqoop

Hadoop Advanced

  • Advanced Developer for Hadoop
  • Java API for HDFS Interactions
  • File Read and Write to HDFS
  • WebHDFS API and interacting with Hadoop using WebHDFS
  • Different protocols used for interacting with HDFS
  • Hadoop RPC and security around RPC
  • Communication between Client and Data Node
  • Hands on Examples with different file format write in HDFS

Advanced MapReduce

  • Hadoop Map-Reduce API
  • InputFormat and Record Readers
  • Splittable and Non Splittable Files
  • Mappers
  • Combiners
  • Patitioners
  • Sorters
  • Reducers
  • OutputFormats and Record Writers
  • Implementing custom Input Formats for PST and PDF
  • MapReduce Execution Framework
  • Counters
  • Inside MapReduce Daemons
  • Failure Handling
  • Speculative Execution


  • Introduction
  • Installation and Configurations
  • Running flume examples with HIVE , Hbase etcFlume Architecture
  • Complex and Multiplexing Flows in Flume
  • Configuring and running flume agents for the various supported sources (NetCat, JMS, Exec, Thrift, AVRO)
  • Configuring and running flume agents with various supported sinks (HDFS, Logger, AVRO, Hbase, FileRoll, ElasticSearch etc)
  • Understanding Batch load to HDFS
  • Example with Flume in real project scenarios
  • Log Analytics
  • Typical challenges with Flume operations
  • Integration with HIVE and Hbase
  • Implementing Custom Flume Sources and Sinks
  • Flume Security with Kerberos

Cared and Crafted by: Velociter

Scroll to Top