Big Data Fast Track

Course Summary

The Big Data Fast Track fast track is about providing a thorough introduction to developers and developer ops job roles. The attendee will receive an introduction and use all the major component frameworks in the big data ecosystem.

Purpose	Learn all about Hadoop and Big Data technologies.
Audience	Anyone wanting to develop solutions on the Hadoop platform. Basic Java experience recommended.
Role	Software Developer - Technical Manager
Skill Level	Intermediate
Style	Workshops
Duration	3 Weeks
Related Technologies	Apache Spark \| Hadoop \| Scala \| Java \| Apache Kafka

Productivity Objectives

To gain a development as well as operational knowledge of Hadoop
Gain exposure to the major Hadoop ecosystem products
Learn the use cases where Big Data technology has the greatest impact

What You'll Learn:

In the Big Data Fast Track training course, you'll learn:

HDFS
- Overview
- Architecture
HDFS Shell
- HDFS Components
- HDFS Shell
Getting Data into HDFS
- Pulling data from External Sources with Flume
- Importing Data from Relational Databases with Sqoop
- REST Interfaces
- Best Practices
Moving Data - Sqoop
- Use Casesexamples
- How to use Sqoop to move data
Moving Data - Flume
- Use CasesExamples
- How to use Flume to move data
- What tool when
HBASE
- Overview
- Use Cases (When would you use it)
- HBASE Architecture
- Designing HBASE tables
- Storage Model
HBASE Shell
- Runtime Modes
- HBASE Shell overview
- HBASE DML
- HBASE DDL
HBASE Java Client API (Data Access and Admin)
- Overview
- Using the Client API to Access HBASE
- Basic HBASE operations
Map Reduce on YARN
- Overview
- History (V1 vs V2)
- Map Reduce Workflow
- Case StudyExample
- Map Reduce Framework Components
- Map Reduce Configuration
First Map Reduce Job with Java
- Overview
- Job Components (Inputformats, OutputFormat, etc.)
- Mapper
- Reducer
- Job configuration
Map Reduce Job Execution
- Components
- Distributed Cache
- Job Execution on YARN
- Failures
Apache Oozie
- Overview
- Job Scheduling with Oozie
- Creating declarative workflows
Apache Pig
- Pig Architecture
- Pig and Map Redce
- Pig access options
- Pig Components
- Running Pig
- Basic Pig Scripts
Joining Data Sets with Pig
- InnerOuterFull Joins
- Building a Pig Script to Join Datasets
- Cogroups
Apache HIVE
- Overview
- Example Use Case from Industry
- Hive Architecture
- Hive MetaStore
- Hive access options
- Creating DatabasesTables
- Loading data
- External vs Internal tables
- Partitions
- Bucketing
- Joins
Hadoop Clients
- What is a Hadoop Client
- Installing and Configuring Hadoop Clients
- Installing and Configuring Hue
Hue Authentication and Configuration
Hadoop Security
- Why Hadoop Security Is Important
- Hadoops Security System Concepts
- What Kerberos Is and How it Works
- Securing a Hadoop Cluster with Kerberos
Managing and Scheduling Jobs
- Managing Running Jobs
- Scheduling Hadoop Jobs
- Configuring the FairScheduler
Cluster Monitoring and Troubleshooting
- General System Monitoring
- Managing Hadoops Log Files
- Monitoring the Clusters
- Common Troubleshooting Issues
Apache Kafka
- Overview
- Use Cases
- Ecosystem
Producer API
Consumer API
- High Level
- Simple
Configuration
- Broker
- Consumer
- Producer
- New Producer
Design Points
- Persistence
- Producer
- Consumer
- Message Delivery
- Replication
- Log Compaction
Apache Storm
- Overview
- General Architecture
- Messaging characteristics
Spouts
Bolts
Deploying a topology
Fault tolerance
The Trident API
- API Overview
- Spouts
Storm Metrics
Integrating Storm with other Big Data frameworks
Apache Spark
- What is Apache Spark
Quick Intro to Scala
- basic Syntax
- Scala Hello World
Spark Basics
- Using the Spark Shell
- Resilient Distributed Datasets (RDDs)
- Functional Programming with Spark
The Hadoop Distributed File System
- Why HDFS
- HDFS Architecture
- Using HDFS
Spark and Hadoop
- Spark and the Hadoop Ecosystem
- Spark and MapReduce
RDDs
- RDD Operations
- Key-Value Pair RDDs
- MapReduce and Pair RDD Operations
Running Spark on a Cluster
- Standalone Cluster
- The Spark Standalone Web UI
Parallel Programming with Spark
- RDD Partitions and HDFS Data Locality
- Working With Partitions
- Executing Parallel Operations
Caching and Persistence
- Distributed Persistence
- Caching
Writing Spark Applications
- SparkContext
- Spark Properties
- Building and Running a Spark Application
- Logging
Spark Streaming
- Streaming Overview
- Sliding Window Operations
- Spark Streaming Applications
Common Spark Algorithms
- Iterative Algorithms
- Graph Analysis
- Machine Learning
Improving Spark Performance
- Shared Variables Broadcast Variables
- Shared Variables Accumulators
- Common Performance Issues

Real-World Content

Project-focused demos and labs using your tool stack and environment, not some canned "training room" lab.

Expert Practitioners

Industry experts that bring their battle scars into the classroom.

Experiential Learning

More coding than lecture, coupled with architectural and design discussions.

Tailored Outlines

One-size-fits-all doesn't apply to training teams. That's where we come in!

“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

First Name*

Last Name*

Business Email*

Company*

Job Title*

Phone*

Country*

Tell us about what you’re looking to accomplish:

By filling out this form and clicking submit, you acknowledge our privacy policy.

Big Data Fast Track

Course Summary

Purpose

Audience

Role

Skill Level

Style

Duration

Related Technologies