Hadoop for Data Analysts

The Hadoop for Data Analysts training course will teach you to manage, manipulate, and query large complex data in real time, using SL and familiar scripting languages on Hadoop.

After an introduction to Hadoop basics, we’ll move onto an in-depth exploration of how Apache Pig and Apache Hive enable data transformations and analyses via filters, joins, and user-defined functions. You will learn how to analyze and process data with Pig, and how to optimize Hive.

Course Summary

Learn how to use Hadoop to manage, manipulate, and query large complex data in real time.
This class is targeted at the non-technical data analyst role. Previous experience with a scripting language like Python recommended.
Skill Level: 
Learning Style: 

Hands-on training is customized, instructor-led training with an in-depth presentation of a technology and its concepts, featuring such topics as Java, OOAD, and Open Source.

Hands On help
3 Days
Productivity Objectives: 
  • Understand Hadoop fundamentals
  • Know how to use Pig to analyze data
  • Understand how to process complex data with Pig
  • Troubleshoot Pig
  • Know when to use Hive
  • Know how to manage data with Hive
  • Understand how to optimize Hive
Hadoop for Data Analysts is part of the Apache Training curriculum.

What You'll Learn

In the Hadoop for Data Analysts training course you’ll learn:

  • Understanding Hadoop 2.0
    • Hadoop Overview
    • The Hadoop Ecosystem
    • The Hadoop Distributed File System (HDFS)
    • Inputting Data into HDFS
    • The MapReduce Framework and YARN
    • Overview of Sqoop/Flume
    • Overview of Ozzie Workflow Engine
  • Introduction to Pig
    • Pig’s Features/Use Cases
    • Interacting with Pig
  • Basic Data Analysis with Pig
    • Pig Latin
    • Loading Data
    • Field Definitions and Simple Data Types
    • Data Output
    • Viewing the Schema
    • Filtering /Sorting Data
    • Common Functions
  • Processing Complex Data with Pig
    • Storage Formats
    • Complex/Nested Data Types
    • Grouping
    • Built­in Functions for Working with Complex Data
    • Iterating Grouped Data
  • Multi­Data Set Operations with Pig
    • Combining Data Sets
    • Joining Data Sets
    • Set Operations
    • Splitting Data Sets
  • Extending Pig
    • Parameters
    • Macros / Imports
    • UDFs
    • Using Other Languages to Process Data with Pig
  • Pig Troubleshooting and Optimization
    • Logging
    • Hadoop’s Web UI
    • Data Sampling and Debugging
    • Understanding the Execution Plan
    • Improving the Performance
  • Introduction to Hive
    • Hive Schema and Data Storage
    • Hive vs. Traditional Databases
    • Hive vs. Pig
    • When to Use Hive
    • Relational Data Analysis with Hive
    • Hive Databases and Tables
    • Basic HiveQL Syntax
    • Data Types
    • Joining Data Sets
    • Common Built­in Functions
  • Hive Data Management
    • Hive Data Formats
    • Creating Databases and Hive­managed Tables
    • Loading Data into Hive
    • Altering Databases and Tables
    • Self­managed Tables
    • Simplifying Queries with Views
    • Storing Query Results
    • Controlling Access to Data
  • Text Processing with Hive
    • Text Processing
    • Important String Functions
    • Using Regular Expressions in Hive
  • Hive Optimization
    • Understanding Query Performance
    • Controlling Job Execution Plan
    • Partitioning
    • Bucketing
    • Indexing Data
  • Extending Hive
    • Data Transformation with Custom Scripts
    • User­defined Functions
    • Parameterized Queries

Meet Your Instructor

Michael headshot

Michael is a practicing software developer, course developer, and trainer with DevelopIntelligence. For the majority of his career, Michael has designed and implemented large-scale, enterprise-grade, Java-based applications at major telecommunications and Internet companies, such as Level3 Communications, US West/Qwest/Century Link, Orbitz, and others.

Michael has a passion for learning new technologies, patterns, and paradigms (or, he has a tendency to get bored or disappointed with current ones)....

Meet Michael »
Mark Picture

Mark is an experienced/hands-on BigData architect. He has been developing software for over 20 years in a variety of technologies (enterprise, web, HPC) and for a variety of verticals (healthcare, O&G, legal, financial). He currently focuses on Hadoop, BigData, NOSQL and Amazon Cloud Services. Mark has been doing Hadoop training for individuals and corporations; his classes are hands-on and draw heavily on his industry experience.
Mark stays active in the...

Meet Mark »
Rich picture

Rich is a full-stack generalist with a deep and wide background in architecture, development and maintenance of web-scale, mission-critical custom applications, and building / leading extraordinary technology teams.

He has spent about equal thirds of his two decade career in the Fortune 500, government, and start-up arenas, where he’s served as everything from the trench-level core developer to VP of Engineering. He currently spends the majority of his time sharing his knowledge about Amazon Web...

Meet Rich »
Sujee Picture

Sujee has been developing software for 15 years. In the last few years he has been consulting and teaching Hadoop, NOSQL and Cloud technologies.
Sujee stays active in Hadoop / Open Source community. He runs a developer focused meetup and Hadoop hackathons called ‘Big Data Gurus’. He has presented at variety of meetups.
Sujee contributes to Hadoop project and other open source projects. He writes about Hadoop and other technologies...

Meet Sujee »

Get Custom Training Quote

We'll work with you to design a custom Hadoop for Data Analysts training program that meets your specific needs. A 100% guaranteed plan that works for you, your team, and your budget.

Learn More

Chat with one of our Program Managers from our Boulder, Colorado office to discuss various training options.

DevelopIntelligence has been in the technical/software development learning and training industry for nearly 20 years. We’ve provided learning solutions to more than 48,000 engineers, across 220 organizations worldwide.

About Develop Intelligence
Di Clients
Need help finding the right learning solution?   Call us: 877-629-5631