Difference	Pig	SQL
Definition	Pig is a scripting language used to interact with HDFS.	SQL is a query language used to interact with databases residing in the database engine.
Query Style	Pig offers a step-by-step execution style.	SQL offers the single block execution style.
Evaluation	Pig does a lazy evaluation, which means that data is processed only when the STORE or DUMP command is encountered.	SQL offers immediate evaluation of a query.
Pipeline Splits	Pipeline Splits are supported in Pig.	In SQL, you need to run the “join” command twice for the result to be materialized as an intermediate result.

Command	Function
load	Reads data from the system
Store	Writes data to file system
foreach	Applies expressions to each record and outputs one or more records
filter	Applies predicate and removes records that do not return true
Group/cogroup	Collects records with the same key from one or more inputs
join	Joins two or more inputs based on a key
order	Sorts records based on a key
distinct	Removes duplicate records
union	Merges data sets
split	Splits data into two or more sets based on filter conditions
stream	Sends all records through a user-provided binary
dump	Writes output to stdout
limit	Limits the number of records

Datasets	URL
Books	http://www.gutenberg.org/ (war_and_peace.text)
Wikipedia Database	https://dumps.wikimedia.org/enwiki/
Open database from Amazon S3 data	https://aws.amazon.com/datasets/
Open database from national climate data	http://cdo.ncdc.noaa.gov/qclcd_ascii

Tutorial Playlist

Hadoop Tutorial for Beginners

What is Hadoop? Components of Hadoop and Its Uses

Hadoop Ecosystem

Hadoop Technology

What is Hadoop Architecture and its Components?

How To Install Hadoop On Ubuntu

Cloudera Quickstart VM Installation - The Best Way

HDFS Tutorial

Mapreduce Tutorial: Everything You Need To Know

MapReduce Example in Apache Hadoop

Yarn Tutorial

HBase Tutorial

Sqoop Tutorial: Your Guide to Managing Big Data on Hadoop the Right Way

Hive Tutorial: Working with Data in Hadoop

Apache Pig Tutorial

Hive vs. Pig: What Is the Best Platform for Big Data Analysis

Top 80 Hadoop Interview Questions and Answers

Apache Pig Tutorial

Hadoop Tutorial for Beginners

What is Hadoop? Components of Hadoop and Its Uses

Hadoop Ecosystem

Hadoop Technology

What is Hadoop Architecture and its Components?

How To Install Hadoop On Ubuntu

Cloudera Quickstart VM Installation - The Best Way

HDFS Tutorial

Mapreduce Tutorial: Everything You Need To Know

MapReduce Example in Apache Hadoop

Yarn Tutorial

HBase Tutorial

Sqoop Tutorial: Your Guide to Managing Big Data on Hadoop the Right Way

Hive Tutorial: Working with Data in Hadoop

Apache Pig Tutorial

Hive vs. Pig: What Is the Best Platform for Big Data Analysis

Top 80 Hadoop Interview Questions and Answers

Table of Contents

Take Your Data Scientist Skills to the Next Level

What is Pig in Hadoop?

Pig - Example

Components of Pig

Take Your Data Scientist Skills to the Next Level

How Pig Works and Stages of Pig Operations

Stage 1: Load data and write Pig script

Stage 2: Pig Operations

Stage 3: Execution of the plan

Salient Features of Pig

Data Model in Pig

Nested Data Model

Take Your Data Scientist Skills to the Next Level

Pig Execution Modes

Local mode

MapReduce mode

Pig Interactive Modes

Interactive mode

Batch mode

Pig vs. SQL

Pig vs. SQL - Example

Pig

SQL

Take Your Data Scientist Skills to the Next Level

Loading and Storing Methods in Pig

Loading

Storing

Pig Script Interpretation

Various Relations Performed by Developers

Pig Commands

Getting Datasets for Pig Development

Take Your Data Scientist Skills to the Next Level

Next Step to Success

About the Author

Recommended Programs

Recommended Resources