Item: Apache Pig
Rating: 7
Author: Verified User

Overall Satisfaction with Apache Pig

Use Cases and Deployment Scope

Apache Pig and its query language (Pig Latin) allowed us to create data
pipelines with ease and heavily used by our teams. The language is designed to reflect the way data
pipelines are designed, so it discards extraneous data, supports user
defined functions (UDFs) , and offers a lot of control over the data
flow.

Pros and Cons

Pros

Data pipeline and aggregation
Log parsing and reporting
Combine Map Reduce jobs

Cons

Pig lacks in supporting the advanced features that Apache Spark provides
Well outdated
Debugging in Pig is a complex part

Most Important Features

Handling unstructured dataset
To perform the tasks of collecting, loading, consolidating the data
Apache Pig is a 1st pass compiler, which is at its best using DAG.

Return on Investment

Doesn't support all kinds of SQL-like abstraction
It's DML based scripting requires lot of training
Error handling is not helpful in debugging production issues

Alternatives Considered

Apache Pig might help to start things faster at first and it was one of the best tool years back but it lacks important features that are needed in the data engineering world right now. Pig also has a steeper learning curve since it uses a proprietary language compared to Spark which can be coded with Python, Java.

Key Insights

Do you think Apache Pig delivers good value for the price?

Yes

Are you happy with Apache Pig's feature set?

Did Apache Pig live up to sales and marketing promises?

Did implementation of Apache Pig go as expected?

Would you buy Apache Pig again?

Other Software Used

Apache Spark, Apache Hive, Apache Spark MLib

Likelihood to Recommend

Write complex map reduce jobs without having much deep knowledge of Java, Python, Scala. Advanced features such as secondary sorting, optimization algorithms, predicate push-down techniques are very useful. With Apache Pig it's easy to aggregate data at scale compared to other tools. It automates important Map Reduce tasks into SQL kind queries.

Comments

Please log in to join the conversation

Apache Pig - lot to improve