Apache Pig Reviews and Ratings

Rating: 8.4 out of 10

Score

8.4 out of 10

Community insights

TrustRadius Insights for Apache Pig are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.

Business Problems Solved

Apache Pig has proven to be an invaluable tool for data engineers working with large datasets in the Apache Hadoop ecosystem. Users have found it to be an excellent high-level scripting language that simplifies the process of working with big data. With Apache Pig, data engineers can easily build pipelines for advanced analysis and machine learning purposes, allowing them to transform and optimize data operations into MapReduce.

One of the key advantages of Apache Pig is its ability to write complex map-reduce or Spark jobs without requiring deep knowledge of Java, Python, or Groovy. This feature has been highly appreciated by users who value the efficiency and simplicity it brings to their work. Additionally, Apache Pig's query language, Pig Latin, provides users with a straightforward way to build data pipelines, eliminating redundant data and supporting user-defined functions UDFs.

The software also gives users control over task execution, which is crucial in maintaining control in a distributed processing system. This control allows users to efficiently handle transportation problems and manage large volumes of data including data streaming from multiple sources and performing joins. Users have utilized Apache Pig to explore and process large datasets in big data analytics projects, performing various operations within a single Java Virtual Machine.

Another key use case for Apache Pig is the generation of aggregate statistics, running refinement and filtering on logs, as well as generating reports for both internal use and customer deliveries. Data science and data engineering teams also utilize Apache Pig for building big data workflows pipelines for ETL and analytics. The software simplifies the creation of these pipelines by providing native language support with Pig Latin, combining features from various database systems like Hive, DBMS, and Spark-SQL.

Overall, Apache Pig offers a versatile solution for handling big data tasks in a simple yet efficient manner. Its user-friendly query language and extensive capabilities make it a valuable tool for data engineers working in the Apache Hadoop ecosystem.

Reviews

9 Reviews

A great ETL tool for your big data

Rating: 9 out of 10

Incentivized

May 23, 2022

Use Cases and Deployment Scope

We are working on a large data analytics project where we have to work on big data, large datasets, and databases. We have used Apache Pig as it helps to explore and process large datasets. It helps in performing several operations such as local execution environments in a single Java Virtual Machine. Apache Pig is somehow easy to learn and use and the data structures are nested and richer. We have used largely whenever we used the analytical insights for our sampling data.

Pros

It provides great support to large datasets and ad-hoc reporting.
It has almost all the set of operators to perform actions such as Join, Sort, Merge, etc.
Anybody can use Apache Pig with some initial training and it is very much familiar with SQL.
It can handle almost all structured, and unstructured data.
Apache Pig is built using the data flows, users can easily see all the processes and information.

Cons

One of the most important limitations of Apache Pig is it does not support OLTP (Online Transaction Processing) as it only supports OLAP (Online Analytical Processing).
Apache Pig has very high latency as compared to Map Reduce.
Apache Pig is designed for ETL and thus not perfectly suited for real-time analysis.
The training materials are hard to learn and need improvements.

Likelihood to Recommend

Apache Pig is best suited for ETL-based data processes. It is good in performance in handling and analyzing a large amount of data. it gives faster results than any other similar tool. It is easy to implement and any user with some initial training or some prior SQL knowledge can work on it. Apache Pig is proud to have a large community base globally.

Verified User

Program Manager in Information Technology (201-500 employees)

Vetted Review

1 year of experience

"Apache Pig Is A Fantastic High-level Scripting Language To Operate With Big Data Sets."

Rating: 8 out of 10

Incentivized

April 9, 2022

Use Cases and Deployment Scope

Apache Pig is called Pig Latin—that it provides a high-level scripting language to perform data analysis, code generation, and manipulation. It is an excellent high-level scripting language for working with large data sets. That work under Apache's open-source project Hadoop. Because of this, we can transform and optimize the data operations into MapReduce, which can be difficult on other platforms. We quickly and easily built data pipelines using its query language. It eliminates redundant data, supports user-defined functions (UDFs), and controls data flow well. Its efficiency in writing complex map-reduce or Spark jobs without deep knowledge of Java, Python, or Groovy is what I like best about Apache Pig. Furthermore, with the assistance of a pig, it is simple to maintain control over the execution of a task.

Pros

Its performance, ease of use, and simplicity in learning and deployment.
Using this tool, we can quickly analyze large amounts of data.
It's adequate for map-reducing large datasets and fully abstracted MapReduce.

Cons

Pig's error debugging consumes most of its development time because it can be unstable and immature.
It is significantly more challenging to learn and master than Hive. It's a little slower than Spark.

Likelihood to Recommend

Apache Pig is a lightweight framework that is simple to learn and put into production. It converts MapReduce tasks into SQL-like queries. It also reduces the data and performs some simple mathematical functions. Combining data is incredibly beneficial. With Apache Pig's Data Time functions, we can get quicker results. It works on 150-180 GB monthly datasets and reduces them in a few minutes. However, it cannot perform sequential operations, such as comparing consecutive lines. And another flaw of this method is that it doesn't allow loops and nested loops to span more than one variable at a time. Then again, I'd say go for it!

Sourov K Chowdhury

Database Software Engineer in Product Management at Best Web Design Ltd. (11-50 employees)

Vetted Review

1 year of experience

View profile

Apache Pig

Rating: 7 out of 10

Incentivized

April 7, 2022

Use Cases and Deployment Scope

We mainly use Apache Pig for its capabilities that allows us to easily create data pipelines. Also it comes with its native language Pig latin which helps to manage to code execution easily. It brings the important features of most of the database systems like Hive, DBMS, Spark-SQL.

Pros

Useful for map -reducing huge datasets
Easy to learn and deploy
Optimization is higher compared to relative products.

Cons

Pace of introducing new features is very slow.
Community is also relatively small because it is still in early stage.
Debug functionality is not there, also it is compile time

Likelihood to Recommend

Debugging the code for errors and functionalities is very time consuming leading to waste of development hours and low quality code. Since it is in early stage community support is also very less as compared to other products

Verified User

C-Level Executive in Product Management (51-200 employees)

Vetted Review

2 years of experience

Apache Pig - lot to improve

Rating: 7 out of 10

Incentivized

April 28, 2021

Use Cases and Deployment Scope

Apache Pig and its query language (Pig Latin) allowed us to create data

pipelines with ease and heavily used by our teams. The language is designed to reflect the way data

pipelines are designed, so it discards extraneous data, supports user

defined functions (UDFs) , and offers a lot of control over the data

flow.

Pros

Data pipeline and aggregation
Log parsing and reporting
Combine Map Reduce jobs

Cons

Pig lacks in supporting the advanced features that Apache Spark provides
Well outdated
Debugging in Pig is a complex part

Likelihood to Recommend

<div>Write complex map reduce jobs without having much deep knowledge of Java, Python, Scala. Advanced features such as secondary sorting, optimization algorithms, predicate push-down techniques are very useful. With Apache Pig it's easy to aggregate data at scale compared to other tools. It automates important Map Reduce tasks into SQL kind queries. </div><div>

</div>

Verified User

Engineer in Engineering (5001-10,000 employees)

Vetted Review

5 years of experience

Useful ETL scripting tool

Rating: 8 out of 10

Incentivized

March 20, 2020

Use Cases and Deployment Scope

Pig is used by data engineers as a stopgap between setting up a Spark environment and having more declarative flexibility than HiveQL while moving away from MapReduce. It solves the problem of needing to iteratively transform and migrate data between supported Hadoop environments while being able to debug the process at each step.

Pros

Iterative Development - you can write aliases/variables, which are not immediately executed and these are stored in a DAG, which is only evaluated upon dumping or storing another alias.
Fast execution - Works with MapReduce, Tez, or Spark execution frameworks to provide fast run times at large scales.
Local and remote interoperability - Scripts that depend on testing a small dataset locally before moving to the full thing can simply be done with "pig -x local."

Cons

General syntax for the FOREACH ... GENERATE feature is confusing for nested actions.
The docs are hard to navigate, but it is made up for by reasonable examples.
A version less than 1.0 doesn't instill confidence in the product that has been around for over half a decade (as of writing).

Likelihood to Recommend

<div>If someone wants to process data and doesn't have access to platforms such as Spark or Flink, and wants to do so in a minimal, portable fashion that requires simply requires learning a new scripting language, then Pig is great. It also supports running the same code against a cluster as a single developer machine for testing.

</div><div>

</div><div>Pig is more suited for batch ETL workloads, not ML or Streaming big data use-cases.

</div>

Jordan Moore

Software Consultant in Information Technology at Avalon Consulting (51-200 employees)

Vetted Review

2 years of experience

View profile

Apache pig - the easier to implement map reducer

Rating: 8 out of 10

Incentivized

October 8, 2018

Use Cases and Deployment Scope

Apache Pig is being used as a map-reduce platform. It is used to handle transportation problems and use large volume of data. It can handle data streaming from multiple sources and join them. This can be used to extract key findings, aggregate results and finally process output which is used for different types of visualizations.

Pros

Fast
Easy to implement
Can process data of almost any size
Easy to learn schema

Cons

It can only work on trivial arithmetic problems.
No or very difficult provision of looping across data
Sequential checks are almost impossible to implement

Likelihood to Recommend

It is well suited when you are aggregating data but really difficult if you want to aggregate based upon line by line. Apache Pig can be picked up in a few days with a few demonstrations. Codes can be written quickly, however, it becomes difficult to take up complicated tasks using it.

Subhadipto Poddar

Research Assistant in Engineering at Iowa State University (5001-10,000 employees)

Vetted Review

2 years of experience

View profile

My Apache Pig Review

Rating: 7 out of 10

Incentivized

June 22, 2018

Use Cases and Deployment Scope

As a requirement of a distributed processing system, we are using Apache Pig within our Information Technology department. I use it to an extent of generating reports with advanced statistical methods, both for internal use as well as external purposes. But our Data Science team and Data Engineering team use it to build pipelines in Big Data environment, to conduct further advanced analysis including for machine learning purposes.

Pros

Long logics in Java? Apache Pig is a good alternative.
Has a lot of great features including table joins on many databases like DBMS, Hive, Spark-SQL etc.
Faster & easy development compared to regular map-reduce jobs.

Cons

UDFS Python errors are not interpretable. Developer struggles for a very very long time if he/she gets these errors.
Being in early stage, it still has a small community for help in related matters.
It needs a lot of improvements yet. Only recently they added datetime module for time series, which is a very basic requirement.

Likelihood to Recommend

It is one great option in terms of database pipelining. It is highly effective for unstructured datasets to work with. Also, Apache Pig being a procedural language, unlike SQL, it is also easy to learn compared to other alternatives. But other alternatives like Apache Spark would be my recommendation due to the high availability of advanced libraries, which will reduce our extra efforts of writing from scratch.

Kartik Chavan

Data Analyst in Information Technology at The University of Texas at Arlington (1001-5000 employees)

Vetted Review

1 year of experience

View profile

Apache Pig - Is it the tool for the job? Maybe, but probably not.

Rating: 7 out of 10

Incentivized

January 18, 2018

Use Cases and Deployment Scope

Apache Pig is one of the distributed processing technologies we are using within the engineering department as a whole and we are currently using it mainly to generate aggregate statistics from logs, run additional refinement and filtering on certain logs, and to generate reports for both internal use and customer deliveries.

Pros

Provides a decent abstraction for Map-Reduce jobs, allowing for a faster result than creating your own MR jobs
Good documentation and resources for learning Pig Latin (the Domain Specific Language of the Apache Pig platform)
Large community allows for easy learning, support, and feature improvements/updates

Cons

May not fit every need and a SQL-like abstraction may be more effective for some tasks (look at Spark-SQL, Hive, or even an actual DBMS)
All Pig jobs are written in a Domain Specific Language so not a lot of transferable knowledge
Writing your own User Defined Functions (UDFS) is a nice feature but can be painful to implement in practice

Likelihood to Recommend

Apache Pig is well suited as part of an ongoing data pipeline where there is already a team of engineers in place that are familiar with the technology since at this point I would consider it relatively depreciated since there are more suitable technologies that have more robust and flexible APIs with the added benefit of being easier to learn and apply. For ad-hoc needs, I would recommend Hive or Spark-SQL if a SQL-esque language makes sense otherwise to make use of Spark + a Notebook technology such as Apache Zeppelin. For production data pipelines I would recommend Apache Spark over Apache Pig for its performance, ease of use, and its libraries.

Verified User

Engineer in Engineering (51-200 employees)

Vetted Review

2 years of experience

Apache Pig - a good toolkit to have in your hadoop ETL toolbox

Rating: 8 out of 10

Incentivized

July 21, 2016

Use Cases and Deployment Scope

Yes, it is used by our data science and data engineering orgs. It is being used to build big data workflows (pipelines) for ETL and analytics. It provides easy and better alternatives to writing Java map-reduce code.

Pros

Apache pig DSL provides a better alternative to Java map reduce code and the instruction set is very easy to learn and master.
It has many advanced features built-in such as joins, secondary sort, many optimizations, predicate push-down, etc.
When Hive was not very advanced (extremely slow) few years ago, pig has always been the go to solution. Now with Spark and Hive (after significant updates), the need to learn apache pig may be questionable.

Cons

Improve Spark support and compatibility
Spark and Hive are already being used main-stream, both of them have an instruction set that is easier to learn and master in a matter of days. While apache pig used to be a great alternative to writing java map reduce, Hive after significant updates is now either equal or better than pig.

Likelihood to Recommend

- Custom load, store, filter functionalities are needed and writing Java map reduce code is not an option due susceptible to bugs.

- Chain multiple MR jobs into one pig job.

Verified User

Team Lead in Engineering (10,001+ employees)

Vetted Review

4 years of experience

Loading Reviews List....

Home
,
Hadoop-Related
,
Apache Pig
,
Reviews and Ratings

"Apache Pig Is A Fantastic High-level Scripting Language To Operate With Big Data Sets."

Rating: 8 out of 10

Incentivized

April 9, 2022

Use Cases and Deployment Scope

Pros

Its performance, ease of use, and simplicity in learning and deployment.
Using this tool, we can quickly analyze large amounts of data.
It's adequate for map-reducing large datasets and fully abstracted MapReduce.

Cons

Pig's error debugging consumes most of its development time because it can be unstable and immature.
It is significantly more challenging to learn and master than Hive. It's a little slower than Spark.

Likelihood to Recommend

Sourov K Chowdhury

Database Software Engineer in Product Management at Best Web Design Ltd. (11-50 employees)

Vetted Review

1 year of experience

View profile