Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Hadoop for Data Analysts

Course Summary

The Hadoop for Data Analysts training course is designed to demonstrate how to manage, manipulate, and query large complex data in real time, using SL and familiar scripting languages on Hadoop.

The course begins with an introduction to Hadoop basics. Next, it explores how Apache Pig and Apache Hive enable data transformations and analyses via filters, joins, and user-defined functions. The course concludes by examining how to analyze and process data with Pig, and how to optimize Hive.

Purpose
Learn how to use Hadoop to manage, manipulate, and query large complex data in real time.
Audience
This class is targeted at the non-technical data analyst role. Previous experience with a scripting language like Python recommended.
Role
Software Developer
Skill Level
Intermediate
Style
Workshops
Duration
3 Days
Related Technologies
Java | Hadoop | Apache

 

Productivity Objectives
  • Understand Hadoop fundamentals
  • Know how to use Pig to analyze data
  • Understand how to process complex data with Pig
  • Troubleshoot Pig
  • Know when to use Hive
  • Know how to manage data with Hive
  • Understand how to optimize Hive

What You'll Learn:

In the Hadoop for Data Analysts training course, you'll learn:
  • Understanding Hadoop
    • Hadoop Overview
    • The Hadoop Ecosystem
    • The Hadoop Distributed File System (HDFS)
    • Input Data into HDFS
    • The MapReduce Framework and YARN
    • Overview of Sqoop/Flume
    • Overview of Ozzie Workflow Engine
  • Introduction to Pig
    • Pig's Features/Use Cases
    • Interact with Pig
  • Basic Data Analysis with Pig
    • Pig Latin
    • Load Data
    • Field Definitions and Simple Data Types
    • Data Output
    • View the Schema
    • Filter/Sort Data
    • Common Functions
  • Processing Complex Data with Pig
    • Storage Formats
    • Complex/Nested data types
    • Groups
    • Built-in functions for working with complex data
    • Iterate grouped data
  • MultiData Set Operations with Pig
    • Combine Data Sets
    • Join Data Sets
    • Set Operations
    • Split Data Sets
  • Extending Pig
    • Parameters
    • Macros/Imports
    • UDFs
    • Use Other Languages to Process Data with Pig
  • Pig Troubleshooting and Optimization
    • Logs
    • Hadoop's Web UI
    • Data samples and debugs
    • Understand the execution plan
    • Improve the performance
  • Introduction to Hive
    • Hive schema and data storage
    • Hive vs. traditional databases
    • Hive vs. pig
    • When to use Hive
    • Relational data analysis with Hive
    • Hive databases and tables
    • Basic HiveQL syntax
    • Data types
    • Joining data sets
    • Common built-in functions
  • Hive Data Management
    • Hive data formats
    • Create databases and Hivemanaged tables
    • Load Data into Hive
    • Alter databases and tables
    • Self-managed tables
    • Simplify queries with views
    • Store query results
    • Control access to data
  • Text Processing with Hive
    • Text Processes
    • Important string functions
    • Use regular expressions in Hive
  • Hive Optimization
    • Understand query performance
    • Control job execution plan
    • Partitioning
    • Bucketing
    • Index Data
  • Extending Hive
    • Data Transformation with Custom Scripts
    • User-defined Functions
    • Parameterized Queries
“I appreciated the instructor's technique of writing live code examples rather than using fixed slide decks to present the material.”

VMware

Dive in and learn more

When transforming your workforce, it's important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.