Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Introduction to Administering Hadoop Clusters

Course Summary

The Introduction to Administering Hadoop Clusters training course is designed to demonstrate the key aspects of installing and maintaining a Hadoop cluster in various forms.

The course begins by examining how to operate the Hadoop Distributed File System (HDFS) file system and MapReduce I/O framework as complementary technologies. Next, it explores how to configure and monitor processes to manage storage and job tasks. The course concludes with an analysis of supplementing clusters with enhanced storage features and client tools.

Purpose
Learn how to set, configure, and administer Hadoop.
Audience
System administrators, developers, and DevOps engineers creating Big Data solutions using Hadoop.
Role
Software Developer - System Administrator
Skill Level
Intermediate
Style
Hack-a-thon - Learning Spikes - Workshops
Duration
4 Days
Related Technologies
Java | Big Data Training | Hadoop | Apache

 

Productivity Objectives
  • Describe the HDFS file system and MapReduce I/O frameworks
  • Configure and monitor storage management processes and tasks
  • Add network topology awareness to a cluster
  • Configure a highly-available storage system
  • Supplement clusters with enhanced storage features and client tools

What You'll Learn:

In the Introduction to Administering Hadoop Clusters training course, you'll learn:
  • Hadoop Concepts
    • Operate on large data sets
    • Parallelize to improve performance
    • Use large block sizes
    • Distribute and replicate data
    • Assign code to data
    • Compensate for Node failures and recoveries
    • Add Nodes for better performance
    • Utilize virtualization for rapid deployment
  • Installing a Hadoop Cluster
    • Understand the NameNode
    • Understand the secondary NameNode
    • Understand the Data Node
    • Understand the JobTracker
    • Understand the TaskTracker
  • Understanding the MapReduce Flow
    • Map data
    • Shuffle and sorting
    • Reduce data
    • Utilize the "Write Once, Read Many" approach
  • Reviewing Job Performance
    • Interpret console output
    • Navigate the JobTracker UI
    • Use TaskTracker logs
  • Configuring Nodes
    • Understand Hadoop property management
    • Manage core properties
    • Manage HDFS properties
    • Manage MapReduce properties
    • Manage worker properties
    • Restrict job property changes
  • Supporting Federated & HA File Systems
    • Restore NameNode services
    • Protect NameNode metadata
    • Use Federated NameNodes
    • Understand the NameNode HA model
      • Configure a Federated HDFS system
      • Create defensive copies of NameNode metadata
  • Controlling Jobs and Resources
    • Schedule jobs
    • Understand the FairScheduler
    • Orchestrate Workflows
  • Importing Legacy and Continuous Data
    • Utilize Sqoop
    • Use Flume
    • Understand Hive, Impala, and HBase
  • Maintaining HDFS
    • Check block integrity
    • Balance data across nodes
    • Use HDFS Safe Mode
    • Address other HDFS systems
    • Restrict Node additions
  • Installing Ecosystem Packages
    • Install Pig
    • Install Hive
    • Review HBase requirements
  • Improving Hadoop Security
    • Review authentication & authorization
    • Analyze Hadoop's authorization model
    • Understand kerberos architecture
    • Review Kerberos implementation options
“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.