Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Big Data Fast Track

Course Summary

The Big Data Fast Track fast track is about providing a thorough introduction to developers and developer ops job roles. The attendee will receive an introduction and use all the major component frameworks in the big data ecosystem.

Purpose
Learn all about Hadoop and Big Data technologies.
Audience
Anyone wanting to develop solutions on the Hadoop platform. Basic Java experience recommended.
Role
Software Developer - Technical Manager
Skill Level
Intermediate
Style
Workshops
Duration
3 Weeks
Related Technologies
Apache Spark | Hadoop | Scala | Java | Apache Kafka

 

Productivity Objectives
  • To gain a development as well as operational knowledge of Hadoop
  • Gain exposure to the major Hadoop ecosystem products
  • Learn the use cases where Big Data technology has the greatest impact

What You'll Learn:

In the Big Data Fast Track training course, you'll learn:
  • HDFS
    • Overview
    • Architecture
  • HDFS Shell
    • HDFS Components
    • HDFS Shell
  • Getting Data into HDFS
    • Pulling data from External Sources with Flume
    • Importing Data from Relational Databases with Sqoop
    • REST Interfaces
    • Best Practices
  • Moving Data - Sqoop
    • Use Casesexamples
    • How to use Sqoop to move data
  • Moving Data - Flume
    • Use CasesExamples
    • How to use Flume to move data
    • What tool when
  • HBASE
    • Overview
    • Use Cases (When would you use it)
    • HBASE Architecture
    • Designing HBASE tables
    • Storage Model
  • HBASE Shell
    • Runtime Modes
    • HBASE Shell overview
    • HBASE DML
    • HBASE DDL
  • HBASE Java Client API (Data Access and Admin)
    • Overview
    • Using the Client API to Access HBASE
    • Basic HBASE operations
  • Map Reduce on YARN
    • Overview
    • History (V1 vs V2)
    • Map Reduce Workflow
    • Case StudyExample
    • Map Reduce Framework Components
    • Map Reduce Configuration
  • First Map Reduce Job with Java
    • Overview
    • Job Components (Inputformats, OutputFormat, etc.)
    • Mapper
    • Reducer
    • Job configuration
  • Map Reduce Job Execution
    • Components
    • Distributed Cache
    • Job Execution on YARN
    • Failures
  • Apache Oozie
    • Overview
    • Job Scheduling with Oozie
    • Creating declarative workflows
  • Apache Pig
    • Pig Architecture
    • Pig and Map Redce
    • Pig access options
    • Pig Components
    • Running Pig
    • Basic Pig Scripts
  • Joining Data Sets with Pig
    • InnerOuterFull Joins
    • Building a Pig Script to Join Datasets
    • Cogroups
  • Apache HIVE
    • Overview
    • Example Use Case from Industry
    • Hive Architecture
    • Hive MetaStore
    • Hive access options
    • Creating DatabasesTables
    • Loading data
    • External vs Internal tables
    • Partitions
    • Bucketing
    • Joins
  • Hadoop Clients
    • What is a Hadoop Client
    • Installing and Configuring Hadoop Clients
    • Installing and Configuring Hue
  • Hue Authentication and Configuration
  • Hadoop Security
    • Why Hadoop Security Is Important
    • Hadoops Security System Concepts
    • What Kerberos Is and How it Works
    • Securing a Hadoop Cluster with Kerberos
  • Managing and Scheduling Jobs
    • Managing Running Jobs
    • Scheduling Hadoop Jobs
    • Configuring the FairScheduler
  • Cluster Monitoring and Troubleshooting
    • General System Monitoring
    • Managing Hadoops Log Files
    • Monitoring the Clusters
    • Common Troubleshooting Issues
  • Apache Kafka
    • Overview
    • Use Cases
    • Ecosystem
  • Producer API
  • Consumer API
    • High Level
    • Simple
  • Configuration
    • Broker
    • Consumer
    • Producer
    • New Producer
  • Design Points
    • Persistence
    • Producer
    • Consumer
    • Message Delivery
    • Replication
    • Log Compaction
  • Apache Storm
    • Overview
    • General Architecture
    • Messaging characteristics
  • Spouts
  • Bolts
  • Deploying a topology
  • Fault tolerance
  • The Trident API
    • API Overview
    • Spouts
  • Storm Metrics
  • Integrating Storm with other Big Data frameworks
  • Apache Spark
    • What is Apache Spark
  • Quick Intro to Scala
    • basic Syntax
    • Scala Hello World
  • Spark Basics
    • Using the Spark Shell
    • Resilient Distributed Datasets (RDDs)
    • Functional Programming with Spark
  • The Hadoop Distributed File System
    • Why HDFS
    • HDFS Architecture
    • Using HDFS
  • Spark and Hadoop
    • Spark and the Hadoop Ecosystem
    • Spark and MapReduce
  • RDDs
    • RDD Operations
    • Key-Value Pair RDDs
    • MapReduce and Pair RDD Operations
  • Running Spark on a Cluster
    • Standalone Cluster
    • The Spark Standalone Web UI
  • Parallel Programming with Spark
    • RDD Partitions and HDFS Data Locality
    • Working With Partitions
    • Executing Parallel Operations
  • Caching and Persistence
    • Distributed Persistence
    • Caching
  • Writing Spark Applications
    • SparkContext
    • Spark Properties
    • Building and Running a Spark Application
    • Logging
  • Spark Streaming
    • Streaming Overview
    • Sliding Window Operations
    • Spark Streaming Applications
  • Common Spark Algorithms
    • Iterative Algorithms
    • Graph Analysis
    • Machine Learning
  • Improving Spark Performance
    • Shared Variables Broadcast Variables
    • Shared Variables Accumulators
    • Common Performance Issues
“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.