Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Introduction to Hadoop for Developers

Course Summary

The Introduction to Hadoop for Developers training course is designed to demonstrate the fundamentals of setting up a Hadoop cluster, as well as the "soup" of related technologies like Hive, Pig and Oozie.

The course begins with an examination of how to access the Hadoop file system and write MapReduce jobs using Java, Pig, and Hive Oozie. Next, the course discusses examples of real world Map Reduce jobs and how Hadoop has solved real world data-intensive processing problems. The course concludes by exploring the different modes in which Hadoop can be run to support massive amounts of data, as well as students' MapReduce jobs during development.

Prerequisites: Basic Java knowledge is expected, and experience with Eclipse is a plus.

Purpose
Learn how to write MapReduce programs using Java.
Audience
System adminstrators, developers, and DevOps engineers creating Big Data solutions using Hadoop.
Role
Software Developer - System Administrator
Skill Level
Intermediate
Style
Hack-a-thon - Learning Spikes - Workshops
Duration
4 Days
Related Technologies
Java | Hadoop | Apache

 

Productivity Objectives
  • Discover the Hadoop Distributed File System (HDFS)
  • Interpret general Hadoop Cluster/HDFS administration
  • Explain MapReduce
  • Define how to write a MapReduce job with Java, Pig, and Hive
  • Differentiate how the different Hadoop technologies inter-operate to provide a cohesive big data solution
  • Demonstrate basic management of a Hadoop cluster
  • Give examples of how to perform basic unit testing of MapReduce jobs
  • Distinguish how Message Passing Interface (MPI) and High Performance Computing (HPC) intersect with Hadoop

What You'll Learn:

In the Introduction to Hadoop for Developers training course, you'll learn:
  • Hadoop Overview
    • Big Data introduction
    • History
    • Comparison to relational databases
    • Hadoop ecosystem
  • HDFS
    • Architecture/Concepts
    • Access
    • Namenodes
    • Filesystem Shell
    • Access HDFS with Java
    • Read/Write/Browse File System
    • Basic HDFS Admin
  • HBASE
    • Overview
    • Architecture
    • Data Model
    • Installation and Shell
    • Access via Java API
    • Scan API
    • Filters
    • Storage Model
    • Table Design
  • Map Reduce on YARN
    • Introduction
    • Process Models
    • Command line tools
    • MapReduce Framework
    • Submit MapReduce jobs
    • Write MapReduce jobs in Java
    • MapReduce theory
    • Distributive cache
    • Speculative execution
    • YARN components
    • Counters
    • Details of MapReduce job execution
  • Hadoop Streaming
    • Implement a Streaming Job
    • Counters in Streaming Jobs
    • Contrast Java Jobs
  • MapReduce Workflows
    • Problem Decomposition into MapReduce Jobs
    • Code Workflows
    • Use the JobControl class
  • Oozie
    • Installation
    • Writing Oozie Workflows
    • Deploying and Running Oozie Jobs
  • Pig
    • Installation
    • Pig Latin
    • Write Pig scripts
    • User defined functions
    • Data set joins
  • Hive
    • Installation
    • Table Creation and Deletion
    • Partitions
    • Load data into Hive
    • Joins
    • Buckets
“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.