Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Hadoop Administration

Course Summary

The Hadoop Administration training course is designed to provide a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster.

The course begins with an introduction to Hadoop including a brief history, core components, and fundamental concepts. Next, it examines how to plan Hadoop clusters, Hadoop Distributed File System (HDFS), and MapReduce. The course concludes with advanced cluster configuration, cluster maintenance, monitoring and troubleshooting.

Purpose
Learn how to maintain and operate a Hadoop cluster.
Audience
Anyone in charge of installing and managing a Hadoop cluster. Basic Unix skills recommended.
Role
Software Developer - System Administrator - Technical Manager
Skill Level
Intermediate
Style
Workshops
Duration
4 Days
Related Technologies
Hadoop | Apache

 

Productivity Objectives
  • Discover the fundamentals of standing up a Hadoop cluster
  • Identify how to configure Hadoop for high availability
  • Relate solid fundamental configurations for maximizing Hadoop operations

What You'll Learn:

In the Hadoop Administration training course, you'll learn:
  • Hadoop Introduction
    • A Brief History of Hadoop
    • Core Hadoop Components
    • Fundamental Concepts
  • Planning Your Hadoop Cluster
    • General Planning Considerations
    • Choose Hardware
    • Network Considerations
    • Configure Nodes
    • Plan for Cluster Management
  • HDFS
    • HDFS Features
    • Write and Read Files
    • NameNode Considerations
    • HDFS Security
    • Namenode Web UI
    • Hadoop File Shell
  • Getting Data into HDFS
    • Pull data from External Sources with Flume
    • Import Data from relational Databases with Sqoop
    • REST Interfaces
    • Best Practices
  • MapReduce
    • MapReduce overview
    • Features of MapReduce
    • Architectural Overview
    • YARN - MapReduce Version 2
    • Failure Recovery
    • The JobTracker Web UI
  • Hadoop Installation and Initial Configuration
    • Configuration & Deployment Types
    • Install Hadoop
    • Specify the Hadoop Configuration
    • Initial HDFS & MapReduce Configuration
    • Log Files
  • Installing/Configuring Hive, Impala, and Pig
    • Hive
    • Impala
    • Pig
  • Hadoop Clients
    • What is a Hadoop Client?
    • Install and configuring Hadoop Clients
    • Install and configure Hue
    • Hue authentication and configuration
  • Advanced Cluster Configuration
    • Advanced configuration parameters
    • Configure Hadoop Ports
    • Explicitly including and excluding hosts
    • Configure HDFS for rack awareness & HDFS high availability
  • Hadoop Security
    • Why Hadoop security is important
    • Hadoop's security system concepts
    • What Kerberos is and how it works
    • Secure a Hadoop Cluster with Kerberos
  • Managing and Scheduling Jobs
    • Manage Running jobs
    • Schedule Hadoop jobs
    • Configure the FairScheduler
  • Cluster Maintenance
    • Check HDFS status
    • Copy Data Between Clusters
    • Add/Remove Cluster nodes
    • Rebalance the Cluster
    • NameNode metadata backup
    • Cluster Upgrades
  • Cluster Monitoring and Troubleshooting
    • General system monitoring
    • Manage Hadoop's log files
    • Monitoring the Clusters
    • Common troubleshooting issues
“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.