Introduction to Hadoop Administration

The Introduction to Hadoop Administration training course will provide you with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster, from installation and configuration through load balancing and tuning.

The course begins with an overview of the Big Data landscape, then dives into a system administration working view of running Hadoop. You’ll get experience with some of the most common and challenging scenarios Hadoop administrators see in the real world, and become familiar with the most up-to-date details of the platform.

This course requires prior knowledge of basic networking; a working knowledge of Unix environment is helpful.

Course Summary

Purpose: 
Learn how to administer and maintain Hadoop.
Audience: 
System administrators, DevOps engineers, and software developers responsible for managing and maintaining Hadoop clusters.
Skill Level: 
Learning Style: 

Hands-on training is customized, instructor-led training with an in-depth presentation of a technology and its concepts, featuring such topics as Java, OOAD, and Open Source.

Hands On help
Duration: 
4 Days
Productivity Objectives: 
  • Understand the fundamental concepts of Hadoop
  • Know how to plan your Hadoop cluster
  • Understand HDFS features
  • Know how to get data into HDFS
  • Know how to work with MapReduce
  • Understand installation and configuration of Hadoop
  • Understand cluster maintenance
Introduction to Hadoop Administration is part of the Apache Training curriculum.

What You'll Learn

In the Introduction to Hadoop Administration training course you’ll learn:

  • Hadoop Introduction
    • A Brief History of Hadoop
    • Core Hadoop Components
    • Fundamental Concepts
  • Planning Your Hadoop Cluster
    • General Planning Considerations
    • Choosing Hardware
    • Network Considerations
    • Configuring Nodes
    • Planning for Cluster Management
  • HDFS
    • HDFS Features
    • Writing and Reading Files
    • NameNode Considerations
    • HDFS Security
    • Namenode Web UI
    • Hadoop File Shell
  • Getting Data into HDFS
    • Pulling data from External Sources with Flume
    • Importing Data from Relational Databases with Sqoop
    • REST Interfaces
    • Best Practices
  • MapReduce
    • MapReduce Overview
    • Features of MapReduce
    • Architectural Overview
    • YARN ­ MapReduce Version 2
    • Failure Recovery
    • The JobTracker Web UI
  • Hadoop Installation and Initial
    • Configuration and Deployment Types
    • Installing Hadoop
    • Specifying the Hadoop Configuration
    • Initial HDFS and MapReduce Configuration
    • Log Files
  • Installing/Configuring Hive, Impala, and Pig
    • Hive
    • Impala
    • Pig
  • Hadoop Clients
    • What is a Hadoop Client?
    • Installing and Configuring Hadoop Clients
    • Installing and Configuring Hue
    • Hue Authentication and Configuration
  • Advanced Cluster Configuration
    • Advanced Configuration Parameters
    • Configuring Hadoop Ports
    • Explicitly Including and Excluding Hosts
    • Configuring HDFS for Rack Awareness and HDFS High Availability
  • Hadoop Security
    • Why Hadoop Security is Important
    • Hadoop’s Security System Concepts
    • What Kerberos is and How it Works
    • Securing a Hadoop Cluster with Kerberos
  • Managing and Scheduling Jobs
    • Managing Running Jobs
    • Scheduling Hadoop Jobs
    • Configuring the FairScheduler
  • Cluster Maintenance
    • Checking HDFS Status
    • Copying Data Between Clusters
    • Adding/Removing Cluster Nodes
    • Rebalancing the Cluster
    • NameNode Metadata Backup
    • Cluster Upgrades
  • Cluster Monitoring and Troubleshooting
    • General System Monitoring
    • Managing Hadoop’s Log Files
    • Monitoring the Clusters
    • Common Troubleshooting Issues

Meet Your Instructor

Michael headshot
Michael

Michael is a practicing software developer, course developer, and trainer with DevelopIntelligence. For the majority of his career, Michael has designed and implemented large-scale, enterprise-grade, Java-based applications at major telecommunications and Internet companies, such as Level3 Communications, US West/Qwest/Century Link, Orbitz, and others.

Michael has a passion for learning new technologies, patterns, and paradigms (or, he has a tendency to get bored or disappointed with current ones)....

Meet Michael »
Rich picture
Rich

Rich is a full-stack generalist with a deep and wide background in architecture, development and maintenance of web-scale, mission-critical custom applications, and building / leading extraordinary technology teams.

He has spent about equal thirds of his two decade career in the Fortune 500, government, and start-up arenas, where he’s served as everything from the trench-level core developer to VP of Engineering. He currently spends the majority of his time sharing his knowledge about Amazon Web...

Meet Rich »
Mark Picture
Mark

Mark is an experienced/hands-on BigData architect. He has been developing software for over 20 years in a variety of technologies (enterprise, web, HPC) and for a variety of verticals (healthcare, O&G, legal, financial). He currently focuses on Hadoop, BigData, NOSQL and Amazon Cloud Services. Mark has been doing Hadoop training for individuals and corporations; his classes are hands-on and draw heavily on his industry experience.
Mark stays active in the...

Meet Mark »
Sujee Picture
Sujee

Sujee has been developing software for 15 years. In the last few years he has been consulting and teaching Hadoop, NOSQL and Cloud technologies. Sujee stays active in Hadoop / Open Source community. He runs a developer focused meetup and Hadoop hackathons called ‘Big Data Gurus’. He has presented at variety of meetups. Sujee contributes to Hadoop project and other open source projects. He writes about Hadoop and other technologies on his website.

Meet Sujee »
Photo of Instructor
Andrew S

Andrew is a mathematician turned software engineer who loves building systems. After graduating with a PhD in pure math, he became fascinated by software startups and has since spent 20 years learning. During this period, he’s worked on a wide variety of projects and platforms, including big data analytics, enterprise optimization, mathematical finance, cross-platform middleware, and medical imaging.

In 2001, Andrew served as company architect at ProfitLogic, a pricing optimization startup...

Meet Andrew S »

Contact us to learn more

Not all training courses are created equal. Let the customization process begin! We'll work with you to design a custom Introduction to Hadoop Administration training course that meets your specific needs.

DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.

About Develop Intelligence
Di Clients

surveyask

Need help finding the right learning solution?   Call us: 877-629-5631