Introduction to Administering Hadoop Clusters

Course Summary

The Introduction to Administering Hadoop Clusters training course is designed to demonstrate the key aspects of installing and maintaining a Hadoop cluster in various forms.

The course begins by examining how to operate the Hadoop Distributed File System (HDFS) file system and MapReduce I/O framework as complementary technologies. Next, it explores how to configure and monitor processes to manage storage and job tasks. The course concludes with an analysis of supplementing clusters with enhanced storage features and client tools.

Purpose	Learn how to set, configure, and administer Hadoop.
Audience	System administrators, developers, and DevOps engineers creating Big Data solutions using Hadoop.
Role	Software Developer - System Administrator
Skill Level	Intermediate
Style	Hack-a-thon - Learning Spikes - Workshops
Duration	4 Days
Related Technologies	Java \| Big Data Training \| Hadoop \| Apache

Productivity Objectives

Describe the HDFS file system and MapReduce I/O frameworks
Configure and monitor storage management processes and tasks
Add network topology awareness to a cluster
Configure a highly-available storage system
Supplement clusters with enhanced storage features and client tools

What You'll Learn:

In the Introduction to Administering Hadoop Clusters training course, you'll learn:

Hadoop Concepts
- Operate on large data sets
- Parallelize to improve performance
- Use large block sizes
- Distribute and replicate data
- Assign code to data
- Compensate for Node failures and recoveries
- Add Nodes for better performance
- Utilize virtualization for rapid deployment
Installing a Hadoop Cluster
- Understand the NameNode
- Understand the secondary NameNode
- Understand the Data Node
- Understand the JobTracker
- Understand the TaskTracker
Understanding the MapReduce Flow
- Map data
- Shuffle and sorting
- Reduce data
- Utilize the "Write Once, Read Many" approach
Reviewing Job Performance
- Interpret console output
- Navigate the JobTracker UI
- Use TaskTracker logs
Configuring Nodes
- Understand Hadoop property management
- Manage core properties
- Manage HDFS properties
- Manage MapReduce properties
- Manage worker properties
- Restrict job property changes
Supporting Federated & HA File Systems
- Restore NameNode services
- Protect NameNode metadata
- Use Federated NameNodes
- Understand the NameNode HA model
  - Configure a Federated HDFS system
  - Create defensive copies of NameNode metadata
Controlling Jobs and Resources
- Schedule jobs
- Understand the FairScheduler
- Orchestrate Workflows
Importing Legacy and Continuous Data
- Utilize Sqoop
- Use Flume
- Understand Hive, Impala, and HBase
Maintaining HDFS
- Check block integrity
- Balance data across nodes
- Use HDFS Safe Mode
- Address other HDFS systems
- Restrict Node additions
Installing Ecosystem Packages
- Install Pig
- Install Hive
- Review HBase requirements
Improving Hadoop Security
- Review authentication & authorization
- Analyze Hadoop's authorization model
- Understand kerberos architecture
- Review Kerberos implementation options

Real-World Content

Project-focused demos and labs using your tool stack and environment, not some canned "training room" lab.

Expert Practitioners

Industry experts that bring their battle scars into the classroom.

Experiential Learning

More coding than lecture, coupled with architectural and design discussions.

Tailored Outlines

One-size-fits-all doesn't apply to training teams. That's where we come in!

“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

First Name*

Last Name*

Business Email*

Company*

Job Title*

Phone*

Country*

Tell us about what you’re looking to accomplish:

By filling out this form and clicking submit, you acknowledge our privacy policy.

Introduction to Administering Hadoop Clusters

Course Summary

Purpose

Audience

Role

Skill Level

Style

Duration

Related Technologies