Item: Apache Pig
Rating: 8
Author: Verified User

Overall Satisfaction with Apache Pig

Use Cases and Deployment Scope

Yes, it is used by our data science and data engineering orgs. It is being used to build big data workflows (pipelines) for ETL and analytics. It provides easy and better alternatives to writing Java map-reduce code.

Pros and Cons

Pros

Apache pig DSL provides a better alternative to Java map reduce code and the instruction set is very easy to learn and master.
It has many advanced features built-in such as joins, secondary sort, many optimizations, predicate push-down, etc.
When Hive was not very advanced (extremely slow) few years ago, pig has always been the go to solution. Now with Spark and Hive (after significant updates), the need to learn apache pig may be questionable.

Cons

Improve Spark support and compatibility
Spark and Hive are already being used main-stream, both of them have an instruction set that is easier to learn and master in a matter of days. While apache pig used to be a great alternative to writing java map reduce, Hive after significant updates is now either equal or better than pig.

Return on Investment

The ROI was definitely positive in the beginning, but hard to say the same now due to advancements in Hive and Spark.

Alternatives Considered

Apache Hive and Apache Spark

- Provided better ways for optimized hadoop jobs than Hive but not anymore.
- Spark DSL is much more advanced and compute times are significantly less.

Other Software Used

Apache Spark, Apache Hive

Likelihood to Recommend

- Custom load, store, filter functionalities are needed and writing Java map reduce code is not an option due susceptible to bugs.
- Chain multiple MR jobs into one pig job.

Comments

Please log in to join the conversation

Apache Pig - a good toolkit to have in your hadoop ETL toolbox