Apache Pig vs. Apache Spark

Overview
ProductRatingMost Used ByProduct SummaryStarting Price
Apache Pig
Score 8.4 out of 10
N/A
Apache Pig is a programming tool for creating MapReduce programs used in Hadoop.N/A
Apache Spark
Score 8.7 out of 10
N/A
N/AN/A
Pricing
Apache PigApache Spark
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache PigApache Spark
Free Trial
NoNo
Free/Freemium Version
NoNo
Premium Consulting/Integration Services
NoNo
Entry-level Setup FeeNo setup feeNo setup fee
Additional Details
More Pricing Information
Community Pulse
Apache PigApache Spark
Considered Both Products
Apache Pig
Chose Apache Pig
I use both Apache Pig and its alternatives like Apache Spark & Apache Hive. Apache Pig was one of the best options in Big Data's initial stages. But now alternatives have taken over the market, rendering Apache Pig behind in the competition. But it is still a better alternative …
Chose Apache Pig
It takes me less time to write a Pig script than get a Spark program running for batch ETL workloads. Compared to Spark, Pig has a steeper learning curve because it employs a proprietary programming language. In one script and one fine, it can handle both Map Reduce and Hadoop. …
Chose Apache Pig
Apache Pig might help to start things faster at first and it was one of the best tool years back but it lacks important features that are needed in the data engineering world right now. Pig also has a steeper learning curve since it uses a proprietary language compared to Spark …
Chose Apache Pig
Pig is more focused on scripting in its own PigLatin language rather than integrate into another language like Java/Scala/Python/SQL.
However, for batch ETL workloads, I find that I can write a Pig script quicker than setting up and deploying a Spark program, for example.
Chose Apache Pig
Early on Apache Pig was a great tool for easily writing distributed processing applications without needing to write a complete Java MapReduce job from scratch, but as time as moved on there now better alternatives to get results faster for both ad-hoc analysis and for …
Chose Apache Pig
- Provided better ways for optimized hadoop jobs than Hive but not anymore.
- Spark DSL is much more advanced and compute times are significantly less.
Apache Spark
Chose Apache Spark
Spark has primarily replaced my use of writing pure Hadoop MapReduce or Apache Pig jobs for processing data. I like the fact that I can alternate between the main programming languages that I know - Java and Python - and use those to learn the Scala API. Spark also can be …
Chose Apache Spark
Apache Pig and Apache Hive provide most of the things spark provide but apache spark has more features like actions and transformations which are easy to code. Spark uses optimization technique as we can select driver program and manipulate DAG (Directed Acyclic Graph)
Python …
Chose Apache Spark
All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional …
Chose Apache Spark
Even with Python, MapReduce is lengthy coding. Combination of Python with Apache Spark will not only shorten the code, but it will effectively increase the speed of algorithms. Occasionally, I use MapReduce, but Apache Spark will replace MapReduce very soon. It has many …
Chose Apache Spark
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and …
Top Pros
Top Cons
Best Alternatives
Apache PigApache Spark
Small Businesses

No answers on this topic

No answers on this topic

Medium-sized Companies
Cloudera Manager
Cloudera Manager
Score 9.7 out of 10
Cloudera Manager
Cloudera Manager
Score 9.7 out of 10
Enterprises
IBM Analytics Engine
IBM Analytics Engine
Score 9.3 out of 10
IBM Analytics Engine
IBM Analytics Engine
Score 9.3 out of 10
All AlternativesView all alternativesView all alternatives
User Ratings
Apache PigApache Spark
Likelihood to Recommend
8.1
(9 ratings)
9.9
(24 ratings)
Likelihood to Renew
-
(0 ratings)
10.0
(1 ratings)
Usability
10.0
(1 ratings)
10.0
(3 ratings)
Support Rating
6.0
(1 ratings)
8.7
(4 ratings)
User Testimonials
Apache PigApache Spark
Likelihood to Recommend
Apache
Apache Pig is best suited for ETL-based data processes. It is good in performance in handling and analyzing a large amount of data. it gives faster results than any other similar tool. It is easy to implement and any user with some initial training or some prior SQL knowledge can work on it. Apache Pig is proud to have a large community base globally.
Read full review
Apache
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review
Pros
Apache
  • Its performance, ease of use, and simplicity in learning and deployment.
  • Using this tool, we can quickly analyze large amounts of data.
  • It's adequate for map-reducing large datasets and fully abstracted MapReduce.
Read full review
Apache
  • Apache Spark makes processing very large data sets possible. It handles these data sets in a fairly quick manner.
  • Apache Spark does a fairly good job implementing machine learning models for larger data sets.
  • Apache Spark seems to be a rapidly advancing software, with the new features making the software ever more straight-forward to use.
Read full review
Cons
Apache
  • UDFS Python errors are not interpretable. Developer struggles for a very very long time if he/she gets these errors.
  • Being in early stage, it still has a small community for help in related matters.
  • It needs a lot of improvements yet. Only recently they added datetime module for time series, which is a very basic requirement.
Read full review
Apache
  • Memory management. Very weak on that.
  • PySpark not as robust as scala with spark.
  • spark master HA is needed. Not as HA as it should be.
  • Locality should not be a necessity, but does help improvement. But would prefer no locality
Read full review
Likelihood to Renew
Apache
No answers on this topic
Apache
Capacity of computing data in cluster and fast speed.
Read full review
Usability
Apache
It is quick, fast and easy to implement Apache Pig which makes is quite popular to be used.
Read full review
Apache
The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Read full review
Support Rating
Apache
The documentation is adequate. I'm not sure how large of an external community there is for support.
Read full review
Apache
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review
Alternatives Considered
Apache
Apache Pig might help to start things faster at first and it was one of the best tool years back but it lacks important features that are needed in the data engineering world right now. Pig also has a steeper learning curve since it uses a proprietary language compared to Spark which can be coded with Python, Java.
Read full review
Apache
All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python
Read full review
Return on Investment
Apache
  • Higher learning curve than other similar technologies so on-boarding new engineers or change ownership of Apache Pig code tends to be a bit of a headache
  • Once the language is learned and understood it can be relatively straightforward to write simple Pig scripts so development can go relatively quickly with a skilled team
  • As distributed technologies grow and improve, overall Apache Pig feels left in the dust and is more legacy code to support than something to actively develop with.
Read full review
Apache
  • Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark.
  • Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy.
  • Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs.
Read full review
ScreenShots