The Introduction to Hadoop for Developers training course teaches the fundamentals of setting up a Hadoop cluster, as well as the “soup” of related technologies like Hive, Pig and Oozie.
You will learn how to access the Hadoop file system, and write MapReduce jobs using Java, Pig, and Hive Oozie, working with your own installation of a Hadoop 2, single node cluster, in hands-on workshops. We will discuss examples of real world Map Reduce jobs, and how Hadoop has solved real world data-intensive processing problems. Then, we will explore the different modes in which Hadoop can be run to support massive amounts of data, as well as your MapReduce jobs during development.
Best of all, you will walk away with a fully configured virtual machine, that can run under VirtualBox or VMWare, with Hadoop and all related technologies installed, configured, and ready to run. The virtual machine will include the necessary development environment (using Eclipse), so you are immediately productive in growing your Hadoop knowledge by using a live environment, without the hassle of having to set one up from scratch.
Prerequisites: Basic Java knowledge (experience with Eclipse is a plus); we recommend courses in our core Java catalog.
- Understand the Hadoop File System (HDFS)
- Understand general Hadoop Cluster/HDFS Admin
- Know what MapReduce is and why you should care
- Know how to write a MapReduce job with Java, Pig, and Hive
- Understand how the different Hadoop technologies inter-operate to provide a cohesive big data solution
- Know basic management of a Hadoop cluster
- Understand how to perform basic unit testing of your MapReduce jobs
- Understand how MPI and HPC intersect with Hadoop
What You'll Learn
In the Introduction to Hadoop for Developers training course, you’ll learn:
- Hadoop Overview
- Big Data Introduction
- Comparison to Relational Databases
- Hadoop Ecosystem
- Filesystem Shell
- Accessing HDFS with Java
- Reading/Writing/Browsing File System
- Basic HDFS Admin
- Data Model
- Installation and Shell
- Access via Java API
- Scan API
- Storage Model
- Table Design
- Map Reduce on YARN
- Processing Model
- Command line tools
- MapReduce Framework
- Submitting MapReduce Jobs
- Writing MapReduce Jobs in Java
- MapReduce Theory
- Distributive Cache
- Speculative Execution
- YARN Components
- Details of MapReduce Job Execution
- Hadoop Streaming
- Implementing a Streaming Job
- Counters in Streaming Jobs
- Contrast with Java Jobs
- MapReduce Workflows
- Problem Decomposition into MapReduce Jobs
- Coding Workflows
- Using the JobControl Class
- Writing Oozie Workflows
- Deploying and Running Oozie Jobs
- Pig Latin
- Writing Pig Scripts
- User Defined functions
- Data Set Joins
- Table Creation and Deletion
- Loading Data into Hive
Meet Your Instructor
Michael is a practicing software developer, course developer, and trainer with DevelopIntelligence. For the majority of his career, Michael has designed and implemented large-scale, enterprise-grade, Java-based applications at major telecommunications and Internet companies, such as Level3 Communications, US West/Qwest/Century Link, Orbitz, and others.
Michael has a passion for learning new technologies, patterns, and paradigms (or, he has a tendency to get bored or disappointed with current ones)....Rich
Rich is a full-stack generalist with a deep and wide background in architecture, development and maintenance of web-scale, mission-critical custom applications, and building / leading extraordinary technology teams.
He has spent about equal thirds of his two decade career in the Fortune 500, government, and start-up arenas, where he’s served as everything from the trench-level core developer to VP of Engineering. He currently spends the majority of his time sharing his knowledge about Amazon Web...Mark
Mark is an experienced/hands-on BigData architect. He has been developing software for over 20 years in a variety of technologies (enterprise, web, HPC) and for a variety of verticals (healthcare, O&G, legal, financial). He currently focuses on Hadoop, BigData, NOSQL and Amazon Cloud Services. Mark has been doing Hadoop training for individuals and corporations; his classes are hands-on and draw heavily on his industry experience.
Mark stays active in the...Sujee
Sujee has been developing software for 15 years. In the last few years he has been consulting and teaching Hadoop, NOSQL and Cloud technologies.
Sujee stays active in Hadoop / Open Source community. He runs a developer focused meetup and Hadoop hackathons called ‘Big Data Gurus’. He has presented at variety of meetups.
Sujee contributes to Hadoop project and other open source projects. He writes about Hadoop and other technologies...Andrew S
Andrew is a mathematician turned software engineer who loves building systems. After graduating with a PhD in pure math, he became fascinated by software startups and has since spent 20 years learning. During this period, he’s worked on a wide variety of projects and platforms, including big data analytics, enterprise optimization, mathematical finance, cross-platform middleware, and medical imaging.
In 2001, Andrew served as company architect at ProfitLogic, a pricing optimization startup...