Apache Pig - a good toolkit to have in your hadoop ETL toolbox
July 21, 2016

Apache Pig - a good toolkit to have in your hadoop ETL toolbox

Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source

Overall Satisfaction with Apache Pig

Yes, it is used by our data science and data engineering orgs. It is being used to build big data workflows (pipelines) for ETL and analytics. It provides easy and better alternatives to writing Java map-reduce code.
  • Apache pig DSL provides a better alternative to Java map reduce code and the instruction set is very easy to learn and master.
  • It has many advanced features built-in such as joins, secondary sort, many optimizations, predicate push-down, etc.
  • When Hive was not very advanced (extremely slow) few years ago, pig has always been the go to solution. Now with Spark and Hive (after significant updates), the need to learn apache pig may be questionable.
  • Improve Spark support and compatibility
  • Spark and Hive are already being used main-stream, both of them have an instruction set that is easier to learn and master in a matter of days. While apache pig used to be a great alternative to writing java map reduce, Hive after significant updates is now either equal or better than pig.
  • The ROI was definitely positive in the beginning, but hard to say the same now due to advancements in Hive and Spark.
- Provided better ways for optimized hadoop jobs than Hive but not anymore.
- Spark DSL is much more advanced and compute times are significantly less.
- Custom load, store, filter functionalities are needed and writing Java map reduce code is not an option due susceptible to bugs.
- Chain multiple MR jobs into one pig job.