Apache Pig - a good toolkit to have in your hadoop ETL toolbox
July 21, 2016
Apache Pig - a good toolkit to have in your hadoop ETL toolbox
Score 8 out of 10
Vetted Review
Verified User
Overall Satisfaction with Apache Pig
Yes, it is used by our data science and data engineering orgs. It is being used to build big data workflows (pipelines) for ETL and analytics. It provides easy and better alternatives to writing Java map-reduce code.
- Apache pig DSL provides a better alternative to Java map reduce code and the instruction set is very easy to learn and master.
- It has many advanced features built-in such as joins, secondary sort, many optimizations, predicate push-down, etc.
- Improve Spark support and compatibility
- The ROI was definitely positive in the beginning, but hard to say the same now due to advancements in Hive and Spark.
- Provided better ways for optimized hadoop jobs than Hive but not anymore.
- Spark DSL is much more advanced and compute times are significantly less.
- Spark DSL is much more advanced and compute times are significantly less.