Apache Pig and its query language (Pig Latin) allowed us to create data pipelines with ease and heavily used by our teams. The language is designed to reflect the way data pipelines are designed, so it discards extraneous data, supports user defined functions (UDFs) , and offers a lot of control over the data flow.
- Data pipeline and aggregation
- Log parsing and reporting
- Combine Map Reduce jobs
- Pig lacks in supporting the advanced features that Apache Spark provides
- Well outdated
- Debugging in Pig is a complex part
Write complex map reduce jobs without having much deep knowledge of Java, Python, Scala. Advanced features such as secondary sorting, optimization algorithms, predicate push-down techniques are very useful. With Apache Pig it's easy to aggregate data at scale compared to other tools. It automates important Map Reduce tasks into SQL kind queries.