Apache Pig vs. Apache Spark

Apache Pig

22 Reviews and Ratings

Apache Spark

159 Reviews and Ratings

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Apache Pig	Score 8.4 out of 10	N/A	Apache Pig is a programming tool for creating MapReduce programs used in Hadoop.	N/A
Apache Spark	Score 8.6 out of 10	N/A	N/A	N/A

Pricing

Apache Pig

Apache Spark

Editions & Modules

No answers on this topic

No answers on this topic

Offerings

Pricing Offerings
Apache Pig	Apache Spark
Free Trial
No	No
Free/Freemium Version
No	No
Premium Consulting/Integration Services
No	No

Entry-level Setup Fee

No setup fee

No setup fee

Additional Details

—

—

More Pricing Information

Community Pulse
	Apache Pig	Apache Spark
Considered Both Products	Apache Pig Kartik Chavan Data Analyst Chose Apache Pig I use both Apache Pig and its alternatives like Apache Spark & Apache Hive. Apache Pig was one of the best options in Big Data's initial stages. But now alternatives have taken over the market, rendering Apache Pig behind in the competition. But it is still a better alternative … Incentivized Helpful? Sourov K Chowdhury Database Software Engineer Chose Apache Pig It takes me less time to write a Pig script than get a Spark program running for batch ETL workloads. Compared to Spark, Pig has a steeper learning curve because it employs a proprietary programming language. In one script and one fine, it can handle both Map Reduce and Hadoop. … Incentivized Helpful? Verified User Engineer Chose Apache Pig Apache Pig might help to start things faster at first and it was one of the best tool years back but it lacks important features that are needed in the data engineering world right now. Pig also has a steeper learning curve since it uses a proprietary language compared to Spark … Incentivized Helpful? Jordan Moore Software Consultant Chose Apache Pig Pig is more focused on scripting in its own PigLatin language rather than integrate into another language like Java/Scala/Python/SQL. However, for batch ETL workloads, I find that I can write a Pig script quicker than setting up and deploying a Spark program, for example. Incentivized Helpful? Verified User Engineer Chose Apache Pig Early on Apache Pig was a great tool for easily writing distributed processing applications without needing to write a complete Java MapReduce job from scratch, but as time as moved on there now better alternatives to get results faster for both ad-hoc analysis and for … Incentivized Helpful? Verified User Team Lead Chose Apache Pig - Provided better ways for optimized hadoop jobs than Hive but not anymore. - Spark DSL is much more advanced and compute times are significantly less. Incentivized Helpful?	Apache Spark Jordan Moore Staff Consultant Chose Apache Spark Spark has primarily replaced my use of writing pure Hadoop MapReduce or Apache Pig jobs for processing data. I like the fact that I can alternate between the main programming languages that I know - Java and Python - and use those to learn the Scala API. Spark also can be … Incentivized Helpful? Kamesh Emani Software Developer Intern Chose Apache Spark Apache Pig and Apache Hive provide most of the things spark provide but apache spark has more features like actions and transformations which are easy to code. Spark uses optimization technique as we can select driver program and manipulate DAG (Directed Acyclic Graph) Python … Incentivized Helpful? Nitin Pasumarthy Software Engineer Chose Apache Spark All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional … Incentivized Helpful? Kartik Chavan Data Analyst Chose Apache Spark Even with Python, MapReduce is lengthy coding. Combination of Python with Apache Spark will not only shorten the code, but it will effectively increase the speed of algorithms. Occasionally, I use MapReduce, but Apache Spark will replace MapReduce very soon. It has many … Incentivized Helpful? Verified User Engineer Chose Apache Spark Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and … Incentivized Helpful?
Top Pros	Pro Easy to learn Pro Big data Pro Large datasets	Pro Machine learning Pro Data sets Pro Easy to use
Top Cons	Minus Hard to navigate Minus Training materials Minus Not support	Minus Data visualization Minus Learning curve Minus Amounts of data

Best Alternatives
	Apache Pig	Apache Spark
Small Businesses	No answers on this topic	No answers on this topic
Medium-sized Companies	Cloudera Manager Score 9.7 out of 10	Cloudera Manager Score 9.7 out of 10
Enterprises	IBM Analytics Engine Score 8.8 out of 10	IBM Analytics Engine Score 8.8 out of 10
All Alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Pig	Apache Spark
Likelihood to Recommend	8.1 (9 ratings)	9.9 (24 ratings)
Likelihood to Renew	- (0 ratings)	10.0 (1 ratings)
Usability	10.0 (1 ratings)	10.0 (3 ratings)
Support Rating	6.0 (1 ratings)	8.7 (4 ratings)

User Testimonials
	Apache Pig	Apache Spark
Likelihood to Recommend	Apache Apache Pig is best suited for ETL-based data processes. It is good in performance in handling and analyzing a large amount of data. it gives faster results than any other similar tool. It is easy to implement and any user with some initial training or some prior SQL knowledge can work on it. Apache Pig is proud to have a large community base globally. Incentivized Verified User Anonymous Read full review	Apache Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible. Incentivized Ananth Gouri Assistant Professor Read full review
Pros	Apache Its performance, ease of use, and simplicity in learning and deployment. Using this tool, we can quickly analyze large amounts of data. It's adequate for map-reducing large datasets and fully abstracted MapReduce. Incentivized Sourov K Chowdhury Database Software Engineer Read full review	Apache Apache Spark makes processing very large data sets possible. It handles these data sets in a fairly quick manner. Apache Spark does a fairly good job implementing machine learning models for larger data sets. Apache Spark seems to be a rapidly advancing software, with the new features making the software ever more straight-forward to use. Incentivized Thomas Young Owner, previous CEO Read full review
Cons	Apache UDFS Python errors are not interpretable. Developer struggles for a very very long time if he/she gets these errors. Being in early stage, it still has a small community for help in related matters. It needs a lot of improvements yet. Only recently they added datetime module for time series, which is a very basic requirement. Incentivized Kartik Chavan Data Analyst Read full review	Apache Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Incentivized Anson Abraham Data Czar Read full review
Likelihood to Renew	Apache No answers on this topic	Apache Capacity of computing data in cluster and fast speed. Steven Li Senior Software Developer (Consultant) Read full review
Usability	Apache It is quick, fast and easy to implement Apache Pig which makes is quite popular to be used. Incentivized Subhadipto Poddar Research Assistant Read full review	Apache The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times. Incentivized Verified User Anonymous Read full review
Support Rating	Apache The documentation is adequate. I'm not sure how large of an external community there is for support. Incentivized Jordan Moore Software Consultant Read full review	Apache 1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications. YM Yogesh Mhasde Technical Manager Read full review
Alternatives Considered	Apache Apache Pig might help to start things faster at first and it was one of the best tool years back but it lacks important features that are needed in the data engineering world right now. Pig also has a steeper learning curve since it uses a proprietary language compared to Spark which can be coded with Python, Java. Incentivized Verified User Anonymous Read full review	Apache All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python Incentivized Nitin Pasumarthy Software Engineer Read full review
Return on Investment	Apache Higher learning curve than other similar technologies so on-boarding new engineers or change ownership of Apache Pig code tends to be a bit of a headache Once the language is learned and understood it can be relatively straightforward to write simple Pig scripts so development can go relatively quickly with a skilled team As distributed technologies grow and improve, overall Apache Pig feels left in the dust and is more legacy code to support than something to actively develop with. Incentivized Verified User Anonymous Read full review	Apache Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark. Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy. Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs. Incentivized Verified User Anonymous Read full review
ScreenShots