Introduction to Hadoop for Developers

Hadoop Essentials

The Introduction to Hadoop for Developers training course teaches the fundamentals of setting up a Hadoop cluster, as well as the “soup” of related technologies like Hive, Pig and Oozie.

You will learn how to access the Hadoop file system, and write MapReduce jobs using Java, Pig, and Hive Oozie, working with your own installation of a Hadoop 2, single node cluster, in hands-on workshops. We will discuss examples of real world Map Reduce jobs, and how Hadoop has solved real world data-intensive processing problems. Then, we will explore the different modes in which Hadoop can be run to support massive amounts of data, as well as your MapReduce jobs during development.

Best of all, you will walk away with a fully configured virtual machine, that can run under VirtualBox or VMWare, with Hadoop and all related technologies installed, configured, and ready to run. The virtual machine will include the necessary development environment (using Eclipse), so you are immediately productive in growing your Hadoop knowledge by using a live environment, without the hassle of having to set one up from scratch.

Prerequisites: Basic Java knowledge (experience with Eclipse is a plus); we recommend courses in our core Java catalog. 

Course Summary

Learn how to write MapReduce programs using Java.
System adminstrators, developers, and DevOps engineers creating Big Data solutions using Hadoop.
Skill Level: 
Learning Style: 

Hands-on training is customized, instructor-led training with an in-depth presentation of a technology and its concepts, featuring such topics as Java, OOAD, and Open Source.

Hands On help
4 Days
Productivity Objectives: 
  • Understand the Hadoop File System (HDFS)
  • Understand general Hadoop Cluster/HDFS Admin
  • Know what MapReduce is and why you should care
  • Know how to write a MapReduce job with Java, Pig, and Hive
  • Understand how the different Hadoop technologies inter-operate to provide a cohesive big data solution
  • Know basic management of a Hadoop cluster
  • Understand how to perform basic unit testing of your MapReduce jobs
  • Understand how MPI and HPC intersect with Hadoop
Introduction to Hadoop for Developers is part of the Apache Training curriculum.

What You'll Learn

In the Introduction to Hadoop for Developers training course, you’ll learn:

  • Hadoop Overview
    • Big Data Introduction
    • History
    • Comparison to Relational Databases
    • Hadoop Ecosystem
  • HDFS
    • Architecture/Concepts
    • Access
    • Namenodes
    • Filesystem Shell
    • Accessing HDFS with Java
    • Reading/Writing/Browsing File System
    • Basic HDFS Admin
    • Overview
    • Architecture
    • Data Model
    • Installation and Shell
    • Access via Java API
    • Scan API
    • Filters
    • Storage Model
    • Table Design
  • Map Reduce on YARN
    • Introduction
    • Processing Model
    • Command line tools
    • MapReduce Framework
    • Submitting MapReduce Jobs
    • Writing MapReduce Jobs in Java
    • MapReduce Theory
    • Distributive Cache
    • Speculative Execution
    • YARN Components
    • Counters
    • Details of MapReduce Job Execution
  • Hadoop Streaming
    • Implementing a Streaming Job
    • Counters in Streaming Jobs
    • Contrast with Java Jobs
  • MapReduce Workflows
    • Problem Decomposition into MapReduce Jobs
    • Coding Workflows
    • Using the JobControl Class
  • Oozie
    • Installation
    • Writing Oozie Workflows
    • Deploying and Running Oozie Jobs
  • Pig
    • Installation
    • Pig Latin
    • Writing Pig Scripts
    • User Defined functions
    • Data Set Joins
  • Hive
    • Installation
    • Table Creation and Deletion
    • Partitioning
    • Loading Data into Hive
    • Joins
    • Bucketing

Meet Your Instructor

Michael headshot

Michael is a practicing software developer, course developer, and trainer with DevelopIntelligence. For the majority of his career, Michael has designed and implemented large-scale, enterprise-grade, Java-based applications at major telecommunications and Internet companies, such as Level3 Communications, US West/Qwest/Century Link, Orbitz, and others.

Michael has a passion for learning new technologies, patterns, and paradigms (or, he has a tendency to get bored or disappointed with current ones)....

Meet Michael »
Rich picture

Rich is a full-stack generalist with a deep and wide background in architecture, development and maintenance of web-scale, mission-critical custom applications, and building / leading extraordinary technology teams.

He has spent about equal thirds of his two decade career in the Fortune 500, government, and start-up arenas, where he’s served as everything from the trench-level core developer to VP of Engineering. He currently spends the majority of his time sharing his knowledge about Amazon Web...

Meet Rich »
Mark Picture

Mark is an experienced/hands-on BigData architect. He has been developing software for over 20 years in a variety of technologies (enterprise, web, HPC) and for a variety of verticals (healthcare, O&G, legal, financial). He currently focuses on Hadoop, BigData, NOSQL and Amazon Cloud Services. Mark has been doing Hadoop training for individuals and corporations; his classes are hands-on and draw heavily on his industry experience.
Mark stays active in the...

Meet Mark »
Sujee Picture

Sujee has been developing software for 15 years. In the last few years he has been consulting and teaching Hadoop, NOSQL and Cloud technologies.
Sujee stays active in Hadoop / Open Source community. He runs a developer focused meetup and Hadoop hackathons called ‘Big Data Gurus’. He has presented at variety of meetups.
Sujee contributes to Hadoop project and other open source projects. He writes about Hadoop and other technologies...

Meet Sujee »
Photo of Instructor
Andrew S

Andrew is a mathematician turned software engineer who loves building systems. After graduating with a PhD in pure math, he became fascinated by software startups and has since spent 20 years learning. During this period, he’s worked on a wide variety of projects and platforms, including big data analytics, enterprise optimization, mathematical finance, cross-platform middleware, and medical imaging.

In 2001, Andrew served as company architect at ProfitLogic, a pricing optimization startup...

Meet Andrew S »

Get Custom Training Quote

We'll work with you to design a custom Introduction to Hadoop for Developers training program that meets your specific needs. A 100% guaranteed plan that works for you, your team, and your budget.

Learn More

Chat with one of our Program Managers from our Boulder, Colorado office to discuss various training options.

DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.

About Develop Intelligence
Di Clients
Need help finding the right learning solution?   Call us: 877-629-5631