Introduction to Administering Hadoop Clusters

Advanced Hadoop Adminstration

The Introduction to Administering Hadoop Clusters training course includes one or more labs to reinforce and extend the topics under discussion, including a review of example configurations and run-time reports.

The Hadoop administration course focuses on the key aspects of installing and maintaining a Hadoop cluster in various forms. During the course you will learn to operate the HDFS file system and MapReduce I/O framework as complementary technologies; configure and monitor processes to manage storage and job tasks; add network topology awareness to a cluster; configure a federated or highly-available storage system; and supplement with cluster with enhanced storage features and client tools.

This course can be extended to five days if additional coverage in the following areas as needed:
a) writing MapReduce jobs
b) managing job properties
c) overviews on ecosystem projects such as Hive, Pig, Impala, and HBase
d) lab integration with an existing in-house cluster

Course Summary

Learn how to set, configure, and administer Hadoop.
System adminstrators, developers, and DevOps engineers creating Big Data solutions using Hadoop.
Skill Level: 
Learning Style: 

Hands-on training is customized, instructor-led training with an in-depth presentation of a technology and its concepts, featuring such topics as Java, OOAD, and Open Source.

Hands On help
4 Days
Productivity Objectives: 
  • Describe the HDFS file system and MapReduce I/O frameworks
  • Configure and monitor storage management processes and tasks
  • Add network topology awareness to a cluster
  • Configure a highly-available storage system
  • Supplement with cluster with enhanced storage features and client tools
Introduction to Administering Hadoop Clusters is part of the Apache Training curriculum.

What You'll Learn

  • Hadoop Concepts
    • Operating on Large Data Sets
    • Parallelizing to Improve Performance
    • Using Large Block Sizes
    • Distributing & Replicating Data
    • Assigning Code to Data
    • Compensating for Node Failures and Recoveries
    • Adding Nodes for Better Performance
    • Using Virtualization for Rapid Deployment
  • Installing a Hadoop Cluster
    • Understanding the NameNode
    • Understanding the Secondary NameNode
    • Understanding the Data Node
    • Understanding the JobTracker
    • Understanding the TaskTracker
  • Understanding the MapReduce Flow
    • Mapping Data
    • Shuffling and Sorting
    • Reducing Data
    • Using the Write Once, Read Many Approach
  • Reviewing Job Performance
    • Interpreting Console Output
    • Navigating the JobTracker UI
    • Using TaskTracker Logs
  • Configuring Nodes
    • Understanding Hadoop Property Management
    • Managing Core Properties
    • Managing HDFS Properties
    • Managing MapReduce Properties
    • Managing Worker Properties
    • Restricting Job Property Changes
  • Supporting Federated & HA File Systems
    • Restoring NameNode Services
    • Protecting NameNode Metadata
    • Using Federated NameNodes
    • Understanding the NameNode HA Model
      Lab: Configure a Federated HDFS system
      Lab: Create defensive copies of NameNode metadata
      Alt: Configure an HA NameNode for manual failover
  • Controlling Jobs and Resources
    • Scheduling Jobs
    • Understanding the FairScheduler
    • Orchestrating Workflows
  • Importing Legacy and Continuous Data
    • Using Sqoop
    • Using Flume
    • Understanding Hive, Impala, and HBase
  • Maintaining HDFS
    • Checking block integrity
    • Balancing data across nodes
    • Using HDFS Safe Mode
    • Addressing Other HDFS Systems
    • Restricting Node Additions
  • Installing Ecosystem Packages
    • Installing Pig
    • Installing Hive
    • Reviewing HBase Requirements
  • Improving Hadoop Security
    • Reviewing Authentication & Authorization
    • Understanding Hadoop’s Authorization Model
    • Understanding Kerberos Architecture
    • Reviewing Kerberos Implementation Options

Meet Your Instructor

Michael headshot

Michael is a practicing software developer, course developer, and trainer with DevelopIntelligence. For the majority of his career, Michael has designed and implemented large-scale, enterprise-grade, Java-based applications at major telecommunications and Internet companies, such as Level3 Communications, US West/Qwest/Century Link, Orbitz, and others.

Michael has a passion for learning new technologies, patterns, and paradigms (or, he has a tendency to get bored or disappointed with current ones)....

Meet Michael »
Mark Picture

Mark is an experienced/hands-on BigData architect. He has been developing software for over 20 years in a variety of technologies (enterprise, web, HPC) and for a variety of verticals (healthcare, O&G, legal, financial). He currently focuses on Hadoop, BigData, NOSQL and Amazon Cloud Services. Mark has been doing Hadoop training for individuals and corporations; his classes are hands-on and draw heavily on his industry experience.
Mark stays active in the...

Meet Mark »
Rich picture

Rich is a full-stack generalist with a deep and wide background in architecture, development and maintenance of web-scale, mission-critical custom applications, and building / leading extraordinary technology teams.

He has spent about equal thirds of his two decade career in the Fortune 500, government, and start-up arenas, where he’s served as everything from the trench-level core developer to VP of Engineering. He currently spends the majority of his time sharing his knowledge about Amazon Web...

Meet Rich »
Sujee Picture

Sujee has been developing software for 15 years. In the last few years he has been consulting and teaching Hadoop, NOSQL and Cloud technologies.
Sujee stays active in Hadoop / Open Source community. He runs a developer focused meetup and Hadoop hackathons called ‘Big Data Gurus’. He has presented at variety of meetups.
Sujee contributes to Hadoop project and other open source projects. He writes about Hadoop and other technologies...

Meet Sujee »

Get Custom Training Quote

We'll work with you to design a custom Introduction to Administering Hadoop Clusters training program that meets your specific needs. A 100% guaranteed plan that works for you, your team, and your budget.

Learn More

Chat with one of our Program Managers from our Boulder, Colorado office to discuss various training options.

DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.

About Develop Intelligence
Di Clients
Need help finding the right learning solution?   Call us: 877-629-5631