hadoop administration - comprehensive

In Hadoop Administration - Comprehensive workshop, delegates will learn:

  • Hadoop, HDFS and it's Ecosystem?
  • Understand Data Loading Techniques using Sqoop and Flume.
  • How to Plan, implement, manage, monitor, and secure a Hadoop Cluster.
  • How to configure backup options, diagnose and recover node failures in a Hadoop Cluster.
  • Have a good understanding of ZooKeeper service.
  • Secure a deployment and understand Backup and Recovery.
  • HBASE, Oozie, Hive, and Hue.

Hadoop Administration - Comprehensive training course presents all the small building blocks with a thorough coverage of each component in the Hadoop Administration stack. We begin by looking at Hadoop’s architecture and its underlying parts with topdown identification of component interactions within the Hadoop eco-system. This course then provides in-depth coverage of Hadoop Administration Distributed FileSystem (HDFS), HBase, Map/Reduce, Oozie, Pig and Hive.

Hadoop Administration - Comprehensive

  • Good knowledge of Linux is required.
  • Fundamental Linux system administration skills such as Linux scripting (perl/bash), good troubleshooting skills, understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks are preferable.
  • No prior knowledge of Apache Hadoop and Hadoop Clusters is required.

Administrators who are interested in learning how to deploy and manage a Hadoop cluster.

Production support Database Administrators, Development Database Administrators, System Administrators, Software Architects, Data Warehouse Professionals, IT Managers, Software Developers and anyone else interested in learning Hadoop Cluster Administration should attend this course.

COURSE AGENDA

  • Hadoop server roles and their usage
  • Rack Awareness
  • Anatomy of Write and Read
  • Replication Pipeline
  • Data Processing
  • Hadoop Installation and Initial Configuration
  • Deploying Hadoop in pseudo-distributed mode
  • Deploying a multi-node Hadoop cluster
  • Installing Hadoop Clients
  • Hive
  • Pig
  • Hue
  • Apache Hadoop
  • HDFS
  • Getting Data into HDFS
  • MapReduce
  • Hadoop Cluster
  • Configuring Secondary NameNode
  • Hadoop 2.0
  • YARN framework
  • MRv2
  • Hadoop 2.0 Cluster setup
  • Deploying Hadoop 2.0 in pseudo-distributed mode
  • Deploying a multi-node Hadoop 2.0 cluster
  • Planning the Hadoop Cluster
  • Cluster Size
  • Hardware and Software considerations
  • Managing and Scheduling Jobs
  • Types of schedulers in Hadoop
  • Configuring the schedulers and run MapReduce jobs
  • Cluster Monitoring
  • Troubleshooting
  • Cloudera Hadoop Manager
  • Configure Rack awareness
  • setting up Hadoop Backup
  • data nodes in a cluster
  • setup quota’s
  • upgrade Hadoop cluster
  • copy data across clusters using distcp
  • Diagnostics and Recovery
  • Cluster Maintenance
  • HDFS Federation
  • Service Monitoring
  • Service and Log Management
  • Auditing and Alerts
  • Service Monitoring
  • Basics of Hadoop Platform Security
  • Securing the Platform
  • Kerberos
  • Oozie, Hive Administration
  • HBase
  • Advanced HBASE
  • HBase and Hive Integration
  • Understanding the Problem
  • Plan
  • Design
  • Create a Hadoop Cluster
  • Setup and Configure commonly used Hadoop ecosystem components such as Pig and Hive
  • Configure Ganglia/kimbana on the Hadoop cluster and troubleshoot the common Cluster Problems