Name: Apache Pig
Rating: 8.4 (22 reviews)
Author: Apache

Overview

What is Apache Pig?

Apache Pig is a programming tool for creating MapReduce programs used in Hadoop.

Recent Reviews

TrustRadius Insights

December 14, 2023

Apache Pig has proven to be an invaluable tool for data engineers working with large datasets in the Apache Hadoop ecosystem. Users have …

A great ETL tool for your big data

9 out of 10

May 23, 2022

Incentivized

We are working on a large data analytics project where we have to work on big data, large datasets, and databases. We have used Apache Pig …

"Apache Pig Is A Fantastic High-level Scripting Language To Operate With Big Data Sets."

8 out of 10

April 09, 2022

Incentivized

Apache Pig is called Pig Latin—that it provides a high-level scripting language to perform data analysis, code generation, and …

Apache Pig

7 out of 10

April 07, 2022

We mainly use Apache Pig for its capabilities that allows us to easily create data pipelines. Also it comes with its native language Pig …

Apache Pig - lot to improve

7 out of 10

April 28, 2021

Incentivized

Apache Pig and its query language (Pig Latin) allowed us to create data pipelines with ease and heavily used by our teams. The language …

Useful ETL scripting tool

8 out of 10

March 20, 2020

Incentivized

Pig is used by data engineers as a stopgap between setting up a Spark environment and having more declarative flexibility than HiveQL …

Apache pig - the easier to implement map reducer

8 out of 10

October 08, 2018

Incentivized

Apache Pig is being used as a map-reduce platform. It is used to handle transportation problems and use large volume of data. It can …

My Apache Pig Review

7 out of 10

June 22, 2018

Incentivized

As a requirement of a distributed processing system, we are using Apache Pig within our Information Technology department. I use it to an …

Apache Pig - Is it the tool for the job? Maybe, but probably not.

7 out of 10

January 18, 2018

Incentivized

Apache Pig is one of the distributed processing technologies we are using within the engineering department as a whole and we are …

Apache Pig - a good toolkit to have in your hadoop ETL toolbox

8 out of 10

July 21, 2016

Incentivized

Yes, it is used by our data science and data engineering orgs. It is being used to build big data workflows (pipelines) for ETL and …

Read all reviews

Return to navigation

Product Details

About
Tech Details

What is Apache Pig?

Apache Pig Technical Details

Operating Systems	Unspecified
Mobile Application	No

Return to navigation

Comparisons

View all alternatives

Compare with

Reviews and Ratings

(22)

January 31st 2024

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Business Problems Solved
Recommendations

Apache Pig has proven to be an invaluable tool for data engineers working with large datasets in the Apache Hadoop ecosystem. Users have found it to be an excellent high-level scripting language that simplifies the process of working with big data. With Apache Pig, data engineers can easily build pipelines for advanced analysis and machine learning purposes, allowing them to transform and optimize data operations into MapReduce.

One of the key advantages of Apache Pig is its ability to write complex map-reduce or Spark jobs without requiring deep knowledge of Java, Python, or Groovy. This feature has been highly appreciated by users who value the efficiency and simplicity it brings to their work. Additionally, Apache Pig's query language, Pig Latin, provides users with a straightforward way to build data pipelines, eliminating redundant data and supporting user-defined functions UDFs.

The software also gives users control over task execution, which is crucial in maintaining control in a distributed processing system. This control allows users to efficiently handle transportation problems and manage large volumes of data including data streaming from multiple sources and performing joins. Users have utilized Apache Pig to explore and process large datasets in big data analytics projects, performing various operations within a single Java Virtual Machine.

Another key use case for Apache Pig is the generation of aggregate statistics, running refinement and filtering on logs, as well as generating reports for both internal use and customer deliveries. Data science and data engineering teams also utilize Apache Pig for building big data workflows pipelines for ETL and analytics. The software simplifies the creation of these pipelines by providing native language support with Pig Latin, combining features from various database systems like Hive, DBMS, and Spark-SQL.

Overall, Apache Pig offers a versatile solution for handling big data tasks in a simple yet efficient manner. Its user-friendly query language and extensive capabilities make it a valuable tool for data engineers working in the Apache Hadoop ecosystem.

Users have provided several recommendations for using Pig as a tool for writing quick big data applications.

One recommendation is that Pig is a good starting point for developing ad-hoc analytics applications, especially for those with basic programming experience in Java.

Another recommendation is to use Pig as a base pipeline for parallelizing and utilizing User-Defined Functions (UDFs) on large datasets. The lazy evaluation feature of Pig allows for efficient program optimization.

Users also appreciate Pig's integration with Hadoop, which provides parallelization, fault-tolerance, and relational database features. This makes Pig suitable for applying statistics to datasets, and its functional programming paradigm aligns well with pipeline processes.

Additionally, users suggest considering Spark or Hive as alternative tools for developing pipelines. While Pig may be more suitable for developers with programming experience, it is free and has extensive online documentation available for learning purposes.

Attribute Ratings

Reviews

(1-9 of 9)

Sort By *

Companies can't remove reviews or game the system. Here's why

May 23, 2022

A great ETL tool for your big data

Verified User

Program Manager in Information Technology

Information Technology & Services Company, 201-500 employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Pros and Cons

It provides great support to large datasets and ad-hoc reporting.
It has almost all the set of operators to perform actions such as Join, Sort, Merge, etc.
Anybody can use Apache Pig with some initial training and it is very much familiar with SQL.
It can handle almost all structured, and unstructured data.
Apache Pig is built using the data flows, users can easily see all the processes and information.

One of the most important limitations of Apache Pig is it does not support OLTP (Online Transaction Processing) as it only supports OLAP (Online Analytical Processing).
Apache Pig has very high latency as compared to Map Reduce.
Apache Pig is designed for ETL and thus not perfectly suited for real-time analysis.
The training materials are hard to learn and need improvements.

April 09, 2022

"Apache Pig Is A Fantastic High-level Scripting Language To Operate With Big Data Sets."

Sourov K Chowdhury

Database Software Engineer

Best Web Design Ltd. (Information Technology & Services, 11-50 employees)

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Pros and Cons

Its performance, ease of use, and simplicity in learning and deployment.
Using this tool, we can quickly analyze large amounts of data.
It's adequate for map-reducing large datasets and fully abstracted MapReduce.

Pig's error debugging consumes most of its development time because it can be unstable and immature.
It is significantly more challenging to learn and master than Hive. It's a little slower than Spark.

April 07, 2022

Apache Pig

Verified User

C-Level Executive in Product Management

Entertainment Company, 51-200 employees

Score 7 out of 10

Vetted Review

Verified User

Pros and Cons

Useful for map -reducing huge datasets
Easy to learn and deploy
Optimization is higher compared to relative products.

Pace of introducing new features is very slow.
Community is also relatively small because it is still in early stage.
Debug functionality is not there, also it is compile time

April 28, 2021

Apache Pig - lot to improve

Verified User

Engineer in Engineering

Internet Company, 5001-10,000 employees

Score 7 out of 10

Vetted Review

Verified User

Incentivized

Pros and Cons

Data pipeline and aggregation
Log parsing and reporting
Combine Map Reduce jobs

Pig lacks in supporting the advanced features that Apache Spark provides
Well outdated
Debugging in Pig is a complex part

March 20, 2020

Useful ETL scripting tool

Jordan Moore

Software Consultant

Avalon Consulting (Information Technology and Services, 51-200 employees)

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Pros and Cons

Iterative Development - you can write aliases/variables, which are not immediately executed and these are stored in a DAG, which is only evaluated upon dumping or storing another alias.
Fast execution - Works with MapReduce, Tez, or Spark execution frameworks to provide fast run times at large scales.
Local and remote interoperability - Scripts that depend on testing a small dataset locally before moving to the full thing can simply be done with "pig -x local."

General syntax for the FOREACH ... GENERATE feature is confusing for nested actions.
The docs are hard to navigate, but it is made up for by reasonable examples.
A version less than 1.0 doesn't instill confidence in the product that has been around for over half a decade (as of writing).

October 08, 2018

Apache pig - the easier to implement map reducer

Subhadipto Poddar

Research Assistant

Iowa State University (Higher Education, 5001-10,000 employees)

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Pros and Cons

Fast
Easy to implement
Can process data of almost any size
Easy to learn schema

It can only work on trivial arithmetic problems.
No or very difficult provision of looping across data
Sequential checks are almost impossible to implement

June 22, 2018

My Apache Pig Review

Kartik Chavan

Data Analyst

The University of Texas at Arlington (Electrical/Electronic Manufacturing, 1001-5000 employees)

Score 7 out of 10

Vetted Review

Verified User

Incentivized

Pros and Cons

Long logics in Java? Apache Pig is a good alternative.
Has a lot of great features including table joins on many databases like DBMS, Hive, Spark-SQL etc.
Faster & easy development compared to regular map-reduce jobs.

UDFS Python errors are not interpretable. Developer struggles for a very very long time if he/she gets these errors.
Being in early stage, it still has a small community for help in related matters.
It needs a lot of improvements yet. Only recently they added datetime module for time series, which is a very basic requirement.

January 18, 2018

Apache Pig - Is it the tool for the job? Maybe, but probably not.

Verified User

Engineer in Engineering

Computer Software Company, 51-200 employees

Score 7 out of 10

Vetted Review

Verified User

Incentivized

Pros and Cons

Provides a decent abstraction for Map-Reduce jobs, allowing for a faster result than creating your own MR jobs
Good documentation and resources for learning Pig Latin (the Domain Specific Language of the Apache Pig platform)
Large community allows for easy learning, support, and feature improvements/updates

May not fit every need and a SQL-like abstraction may be more effective for some tasks (look at Spark-SQL, Hive, or even an actual DBMS)
All Pig jobs are written in a Domain Specific Language so not a lot of transferable knowledge
Writing your own User Defined Functions (UDFS) is a nice feature but can be painful to implement in practice

July 21, 2016

Apache Pig - a good toolkit to have in your hadoop ETL toolbox

Verified User

Team Lead in Engineering

Retail Company, 10,001+ employees

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Pros and Cons

Apache pig DSL provides a better alternative to Java map reduce code and the instruction set is very easy to learn and master.
It has many advanced features built-in such as joins, secondary sort, many optimizations, predicate push-down, etc.
When Hive was not very advanced (extremely slow) few years ago, pig has always been the go to solution. Now with Spark and Hive (after significant updates), the need to learn apache pig may be questionable.

Improve Spark support and compatibility
Spark and Hive are already being used main-stream, both of them have an instruction set that is easier to learn and master in a matter of days. While apache pig used to be a great alternative to writing java map reduce, Hive after significant updates is now either equal or better than pig.

Return to navigation

Hive

Apache Spark

Apache Sqoop

Apache Flume

Google BigQuery

Hadoop

Presto

Apache Hive

Apache Drill

Databricks Lakehouse Platform

Community Insights