Big Data Bootcamp

The Big Data Bootcamp bootcamp is about providing a thorough introduction to developers and developer ops job roles. The attendee will receive an introduction and use all the major component frameworks in the big data ecosystem.

Course Summary

Learn all about Hadoop and Big Data technologies.
Anyone wanting to develop solutions on the Hadoop platform. Basic Java experience recommended.
Skill Level: 
Learning Style: 

Boot Camp training is fast-tracked, hands-on, instructor-led training covering multiple related concepts and technologies in a condensed fashion.

Boot Camp help
3 Weeks
Productivity Objectives: 
  • To gain a development as well as operational knowledge of Hadoop
  • Gain exposure to the major Hadoop ecosystem products
  • Learn the use cases where Big Data technology has the greatest impact

What You'll Learn

Week 1

  • Hadoop Introduction
  • A Brief History of Hadoop
  • Core Hadoop Components
  • Fundamental Concepts

  • HDFS
    • Overview
    • Architecture
  • HDFS Shell
    • HDFS Components
    • HDFS Shell
  • Getting Data into HDFS
    • Pulling data from External Sources with Flume
    • Importing Data from Relational Databases with Sqoop
    • REST Interfaces
    • Best Practices
  • Moving Data – Sqoop
    • Use Cases/examples
    • How to use Sqoop to move data
  • Moving Data – Flume
    • Use Cases/Examples
    • How to use Flume to move data
    • What tool when?
    • Overview
    • Use Cases (When would you use it)
    • HBASE Architecture
    • Designing HBASE tables
    • Storage Model
  • HBASE Shell
    • Runtime Modes
    • HBASE Shell overview
  • HBASE Java Client API (Data Access and Admin)
    • Overview
    • Using the Client API to Access HBASE
    • Basic HBASE operations
  • Map Reduce on YARN
    • Overview
    • History (V1 vs V2)
    • Map Reduce Workflow
    • Case Study/Example
    • Map Reduce Framework Components
    • Map Reduce Configuration
  • First Map Reduce Job with Java
    • Overview
    • Job Components (Inputformats,OutputFormat, etc)
    • Mapper
    • Reducer
    • Job configuration
  • Map Reduce Job Execution
    • Components
    • Distributed Cache
    • Job Execution on YARN
    • Failures
  • Apache Oozie
    • Overview
    • Job Scheduling with Oozie
    • Creating declarative workflows
  • Apache Pig
    • Pig Architecture
    • Pig and Map Redce
    • Pig access options
    • Pig Components
    • Running Pig
    • Basic Pig Scripts
  • Joining Data Sets with Pig
    • Inner/Outer/Full Joins
    • Building a Pig Script to Join Datasets
    • Cogroups
  • Apache HIVE
    • Overview
    • Example/Use Case from Industry
    • Hive Architecture
    • Hive MetaStore
    • Hive access options
    • Creating Databases/Tables
    • Loading data
    • External vs Internal tables
    • Partitions
    • Bucketing
    • Joins

Week 2

  • Hadoop Clients
    • What is a Hadoop Client?
    • Installing and Configuring Hadoop Clients
    • Installing and Configuring Hue
  • Hue Authentication and Configuration

  • Hadoop Security
    • Why Hadoop Security Is Important
    • Hadoop’s Security System Concepts
    • What Kerberos Is and How it Works
    • Securing a Hadoop Cluster with Kerberos
  • Managing and Scheduling Jobs
    • Managing Running Jobs
    • Scheduling Hadoop Jobs
    • Configuring the FairScheduler
  • Cluster Monitoring and Troubleshooting
    • General System Monitoring
    • Managing Hadoop’s Log Files
    • Monitoring the Clusters
    • Common Troubleshooting Issues
  • Apache Kafka
    • Overview
    • Use Cases
    • Ecosystem
  • Producer API
  • Consumer API
    • High Level
    • Simple
  • Configuration
    • Broker
    • Consumer
    • Producer
    • New Producer
  • Design Points
    • Persistence
    • Producer
    • Consumer
    • Message Delivery
    • Replication
    • Log Compaction
  • Apache Storm
    • Overview
    • General Architecture
    • Messaging characteristics
  • Spouts
  • Bolts
  • Deploying a topology
  • Fault tolerance
  • The Trident API
    • API Overview
    • Spouts
  • Storm Metrics
  • Integrating Storm with other Big Data frameworks

Week 3

  • Apache Spark
    • What is Apache Spark?
  • Quick Intro to Scala
    • basic Syntax
    • Scala Hello World
  • Spark Basics
    • Using the Spark Shell
    • Resilient Distributed Datasets (RDDs)
    • Functional Programming with Spark
  • The Hadoop Distributed File System
    • Why HDFS?
    • HDFS Architecture
    • Using HDFS
  • Spark and Hadoop
    • Spark and the Hadoop Ecosystem
    • Spark and MapReduce
  • RDDs
    • RDD Operations
    • Key-Value Pair RDDs
    • MapReduce and Pair RDD Operations
  • Running Spark on a Cluster
    • Standalone Cluster
    • The Spark Standalone Web UI
  • Parallel Programming with Spark
    • RDD Partitions and HDFS Data Locality
    • Working With Partitions
    • Executing Parallel Operations
  • Caching and Persistence
    • Distributed Persistence
    • Caching
  • Writing Spark Applications
    • SparkContext
    • Spark Properties
    • Building and Running a Spark Application
    • Logging
  • Spark Streaming
    • Streaming Overview
    • Sliding Window Operations
    • Spark Streaming Applications
  • Common Spark Algorithms
    • Iterative Algorithms
    • Graph Analysis
    • Machine Learning
  • Improving Spark Performance
    • Shared Variables: Broadcast Variables
    • Shared Variables: Accumulators
    • Common Performance Issues

Meet Your Instructor

Rich picture

Rich is a full-stack generalist with a deep and wide background in architecture, development and maintenance of web-scale, mission-critical custom applications, and building / leading extraordinary technology teams.

He has spent about equal thirds of his two decade career in the Fortune 500, government, and start-up arenas, where he’s served as everything from the trench-level core developer to VP of Engineering. He currently spends the majority of his time sharing his knowledge about Amazon Web...

Meet Rich »
Sujee Picture

Sujee has been developing software for 15 years. In the last few years he has been consulting and teaching Hadoop, NOSQL and Cloud technologies. Sujee stays active in Hadoop / Open Source community. He runs a developer focused meetup and Hadoop hackathons called ‘Big Data Gurus’. He has presented at variety of meetups. Sujee contributes to Hadoop project and other open source projects. He writes about Hadoop and other technologies on his website.

Meet Sujee »
Photo of Instructor
Andrew S

Andrew is a mathematician turned software engineer who loves building systems. After graduating with a PhD in pure math, he became fascinated by software startups and has since spent 20 years learning. During this period, he’s worked on a wide variety of projects and platforms, including big data analytics, enterprise optimization, mathematical finance, cross-platform middleware, and medical imaging.

In 2001, Andrew served as company architect at ProfitLogic, a pricing optimization startup...

Meet Andrew S »

Contact us to learn more

Not all training courses are created equal. Let the customization process begin! We'll work with you to design a custom Big Data Bootcamp training course that meets your specific needs.

DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.

About Develop Intelligence
Di Clients


Need help finding the right learning solution?   Call us: 877-629-5631