TrustRadius
Apache Pig is a programming tool for creating MapReduce programs used in Hadoop.https://dudodiprj2sv7.cloudfront.net/product-logos/xl/r1/RM6U3778FRLX.gifMy Apache Pig ReviewAs a requirement of a distributed processing system, we are using Apache Pig within our Information Technology department. I use it to an extent of generating reports with advanced statistical methods, both for internal use as well as external purposes. But our Data Science team and Data Engineering team use it to build pipelines in Big Data environment, to conduct further advanced analysis including for machine learning purposes.,Long logics in Java? Apache Pig is a good alternative. Has a lot of great features including table joins on many databases like DBMS, Hive, Spark-SQL etc. Faster & easy development compared to regular map-reduce jobs.,UDFS Python errors are not interpretable. Developer struggles for a very very long time if he/she gets these errors. Being in early stage, it still has a small community for help in related matters. It needs a lot of improvements yet. Only recently they added datetime module for time series, which is a very basic requirement.,7,Return on Investments are significant considering what it can do with traditional analysis techniques. But, other alternatives like Apache Spark, Hive being more efficient, it is hard to stick to Apache Pig. It can handle large datasets pretty easily compared to SQL. But, again, alternatives are more efficient. While working on unstructured, decentralized dataset, Pig is highly beneficial, as it is not a complete deviation from SQL, but it does not take you in complexity MapReduce as well.,Apache Hive, Apache Spark and Apache Spark MLib,Apache Hive, Apache Spark, Apache Spark MLibApache pig - the easier to implement map reducerApache Pig is being used as a map-reduce platform. It is used to handle transportation problems and use large volume of data. It can handle data streaming from multiple sources and join them. This can be used to extract key findings, aggregate results and finally process output which is used for different types of visualizations.,Fast Easy to implement Can process data of almost any size Easy to learn schema,It can only work on trivial arithmetic problems. No or very difficult provision of looping across data Sequential checks are almost impossible to implement,8,Positive includes quicker solutions to basic problems Negative can be we also had to incorporate other softwares for advanced work. Another positive is time saving,Oracle Java SE,Oracle Java SE,Arithmetic,Looping,10Apache Pig - Is it the tool for the job? Maybe, but probably not.Apache Pig is one of the distributed processing technologies we are using within the engineering department as a whole and we are currently using it mainly to generate aggregate statistics from logs, run additional refinement and filtering on certain logs, and to generate reports for both internal use and customer deliveries.,Provides a decent abstraction for Map-Reduce jobs, allowing for a faster result than creating your own MR jobs Good documentation and resources for learning Pig Latin (the Domain Specific Language of the Apache Pig platform) Large community allows for easy learning, support, and feature improvements/updates,May not fit every need and a SQL-like abstraction may be more effective for some tasks (look at Spark-SQL, Hive, or even an actual DBMS) All Pig jobs are written in a Domain Specific Language so not a lot of transferable knowledge Writing your own User Defined Functions (UDFS) is a nice feature but can be painful to implement in practice,7,Higher learning curve than other similar technologies so on-boarding new engineers or change ownership of Apache Pig code tends to be a bit of a headache Once the language is learned and understood it can be relatively straightforward to write simple Pig scripts so development can go relatively quickly with a skilled team As distributed technologies grow and improve, overall Apache Pig feels left in the dust and is more legacy code to support than something to actively develop with.,Apache Hive and Apache Spark,Oracle Java SE, Eclipse, IntelliJ IDEA, HipChat, JIRA Software, Databricks, Hortonworks Data PlatformApache Pig - a good toolkit to have in your hadoop ETL toolboxYes, it is used by our data science and data engineering orgs. It is being used to build big data workflows (pipelines) for ETL and analytics. It provides easy and better alternatives to writing Java map-reduce code.,Apache pig DSL provides a better alternative to Java map reduce code and the instruction set is very easy to learn and master. It has many advanced features built-in such as joins, secondary sort, many optimizations, predicate push-down, etc. When Hive was not very advanced (extremely slow) few years ago, pig has always been the go to solution. Now with Spark and Hive (after significant updates), the need to learn apache pig may be questionable.,Improve Spark support and compatibility Spark and Hive are already being used main-stream, both of them have an instruction set that is easier to learn and master in a matter of days. While apache pig used to be a great alternative to writing java map reduce, Hive after significant updates is now either equal or better than pig.,8,The ROI was definitely positive in the beginning, but hard to say the same now due to advancements in Hive and Spark.,Apache Hive and Apache Spark,Apache Spark, Apache Hive
Unspecified
Apache Pig
18 Ratings
Score 7.3 out of 101
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>TRScore

Apache Pig Reviews

Apache Pig
18 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 7.3 out of 101
Show Filters 
Hide Filters 
Filter 18 vetted Apache Pig reviews and ratings
Clear all filters
Overall Rating
Reviewer's Company Size
Last Updated
By Topic
Industry
Department
Experience
Job Type
Role
Reviews (1-4 of 4)
  Vendors can't alter or remove reviews. Here's why.
Kartik Chavan profile photo
June 22, 2018

"My Apache Pig Review"

Score 7 out of 10
Vetted Review
Verified User
Review Source
As a requirement of a distributed processing system, we are using Apache Pig within our Information Technology department. I use it to an extent of generating reports with advanced statistical methods, both for internal use as well as external purposes. But our Data Science team and Data Engineering team use it to build pipelines in Big Data environment, to conduct further advanced analysis including for machine learning purposes.
  • Long logics in Java? Apache Pig is a good alternative.
  • Has a lot of great features including table joins on many databases like DBMS, Hive, Spark-SQL etc.
  • Faster & easy development compared to regular map-reduce jobs.
  • UDFS Python errors are not interpretable. Developer struggles for a very very long time if he/she gets these errors.
  • Being in early stage, it still has a small community for help in related matters.
  • It needs a lot of improvements yet. Only recently they added datetime module for time series, which is a very basic requirement.
It is one great option in terms of database pipelining. It is highly effective for unstructured datasets to work with. Also, Apache Pig being a procedural language, unlike SQL, it is also easy to learn compared to other alternatives. But other alternatives like Apache Spark would be my recommendation due to the high availability of advanced libraries, which will reduce our extra efforts of writing from scratch.
Read Kartik Chavan's full review
Subhadipto Poddar profile photo
October 08, 2018

Review: "Apache pig - the easier to implement map reducer"

Score 8 out of 10
Vetted Review
Verified User
Review Source
Apache Pig is being used as a map-reduce platform. It is used to handle transportation problems and use large volume of data. It can handle data streaming from multiple sources and join them. This can be used to extract key findings, aggregate results and finally process output which is used for different types of visualizations.
  • Fast
  • Easy to implement
  • Can process data of almost any size
  • Easy to learn schema
  • It can only work on trivial arithmetic problems.
  • No or very difficult provision of looping across data
  • Sequential checks are almost impossible to implement
It is well suited when you are aggregating data but really difficult if you want to aggregate based upon line by line. Apache Pig can be picked up in a few days with a few demonstrations. Codes can be written quickly, however, it becomes difficult to take up complicated tasks using it.
Read Subhadipto Poddar's full review
No photo available
January 18, 2018

Review: "Apache Pig - Is it the tool for the job? Maybe, but probably not."

Score 7 out of 10
Vetted Review
Verified User
Review Source
Apache Pig is one of the distributed processing technologies we are using within the engineering department as a whole and we are currently using it mainly to generate aggregate statistics from logs, run additional refinement and filtering on certain logs, and to generate reports for both internal use and customer deliveries.
  • Provides a decent abstraction for Map-Reduce jobs, allowing for a faster result than creating your own MR jobs
  • Good documentation and resources for learning Pig Latin (the Domain Specific Language of the Apache Pig platform)
  • Large community allows for easy learning, support, and feature improvements/updates
  • May not fit every need and a SQL-like abstraction may be more effective for some tasks (look at Spark-SQL, Hive, or even an actual DBMS)
  • All Pig jobs are written in a Domain Specific Language so not a lot of transferable knowledge
  • Writing your own User Defined Functions (UDFS) is a nice feature but can be painful to implement in practice
Apache Pig is well suited as part of an ongoing data pipeline where there is already a team of engineers in place that are familiar with the technology since at this point I would consider it relatively depreciated since there are more suitable technologies that have more robust and flexible APIs with the added benefit of being easier to learn and apply. For ad-hoc needs, I would recommend Hive or Spark-SQL if a SQL-esque language makes sense otherwise to make use of Spark + a Notebook technology such as Apache Zeppelin. For production data pipelines I would recommend Apache Spark over Apache Pig for its performance, ease of use, and its libraries.
Read this authenticated review
No photo available
July 21, 2016

Review: "Apache Pig - a good toolkit to have in your hadoop ETL toolbox"

Score 8 out of 10
Vetted Review
Verified User
Review Source
Yes, it is used by our data science and data engineering orgs. It is being used to build big data workflows (pipelines) for ETL and analytics. It provides easy and better alternatives to writing Java map-reduce code.
  • Apache pig DSL provides a better alternative to Java map reduce code and the instruction set is very easy to learn and master.
  • It has many advanced features built-in such as joins, secondary sort, many optimizations, predicate push-down, etc.
  • When Hive was not very advanced (extremely slow) few years ago, pig has always been the go to solution. Now with Spark and Hive (after significant updates), the need to learn apache pig may be questionable.
  • Improve Spark support and compatibility
  • Spark and Hive are already being used main-stream, both of them have an instruction set that is easier to learn and master in a matter of days. While apache pig used to be a great alternative to writing java map reduce, Hive after significant updates is now either equal or better than pig.
- Custom load, store, filter functionalities are needed and writing Java map reduce code is not an option due susceptible to bugs.
- Chain multiple MR jobs into one pig job.
Read this authenticated review

About Apache Pig

Apache Pig is a programming tool for creating MapReduce programs used in Hadoop.
Categories:  Hadoop-Related

Apache Pig Technical Details

Operating Systems: Unspecified
Mobile Application:No