Introduction to Hadoop for Developers

Course Summary

The Introduction to Hadoop for Developers training course is designed to demonstrate the fundamentals of setting up a Hadoop cluster, as well as the "soup" of related technologies like Hive, Pig and Oozie.

The course begins with an examination of how to access the Hadoop file system and write MapReduce jobs using Java, Pig, and Hive Oozie. Next, the course discusses examples of real world Map Reduce jobs and how Hadoop has solved real world data-intensive processing problems. The course concludes by exploring the different modes in which Hadoop can be run to support massive amounts of data, as well as students' MapReduce jobs during development.

Prerequisites: Basic Java knowledge is expected, and experience with Eclipse is a plus.

Purpose	Learn how to write MapReduce programs using Java.
Audience	System adminstrators, developers, and DevOps engineers creating Big Data solutions using Hadoop.
Role	Software Developer - System Administrator
Skill Level	Intermediate
Style	Hack-a-thon - Learning Spikes - Workshops
Duration	4 Days
Related Technologies	Java \| Hadoop \| Apache

Productivity Objectives

Discover the Hadoop Distributed File System (HDFS)
Interpret general Hadoop Cluster/HDFS administration
Explain MapReduce
Define how to write a MapReduce job with Java, Pig, and Hive
Differentiate how the different Hadoop technologies inter-operate to provide a cohesive big data solution
Demonstrate basic management of a Hadoop cluster
Give examples of how to perform basic unit testing of MapReduce jobs
Distinguish how Message Passing Interface (MPI) and High Performance Computing (HPC) intersect with Hadoop

What You'll Learn:

In the Introduction to Hadoop for Developers training course, you'll learn:

Hadoop Overview
- Big Data introduction
- History
- Comparison to relational databases
- Hadoop ecosystem
HDFS
- Architecture/Concepts
- Access
- Namenodes
- Filesystem Shell
- Access HDFS with Java
- Read/Write/Browse File System
- Basic HDFS Admin
HBASE
- Overview
- Architecture
- Data Model
- Installation and Shell
- Access via Java API
- Scan API
- Filters
- Storage Model
- Table Design
Map Reduce on YARN
- Introduction
- Process Models
- Command line tools
- MapReduce Framework
- Submit MapReduce jobs
- Write MapReduce jobs in Java
- MapReduce theory
- Distributive cache
- Speculative execution
- YARN components
- Counters
- Details of MapReduce job execution
Hadoop Streaming
- Implement a Streaming Job
- Counters in Streaming Jobs
- Contrast Java Jobs
MapReduce Workflows
- Problem Decomposition into MapReduce Jobs
- Code Workflows
- Use the JobControl class
Oozie
- Installation
- Writing Oozie Workflows
- Deploying and Running Oozie Jobs
Pig
- Installation
- Pig Latin
- Write Pig scripts
- User defined functions
- Data set joins
Hive
- Installation
- Table Creation and Deletion
- Partitions
- Load data into Hive
- Joins
- Buckets

Real-World Content

Project-focused demos and labs using your tool stack and environment, not some canned "training room" lab.

Expert Practitioners

Industry experts that bring their battle scars into the classroom.

Experiential Learning

More coding than lecture, coupled with architectural and design discussions.

Tailored Outlines

One-size-fits-all doesn't apply to training teams. That's where we come in!

“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

First Name*

Last Name*

Business Email*

Company*

Job Title*

Phone*

Country*

Tell us about what you’re looking to accomplish:

By filling out this form and clicking submit, you acknowledge our privacy policy.

Introduction to Hadoop for Developers

Course Summary

Purpose

Audience

Role

Skill Level

Style

Duration

Related Technologies