Apache Spark vs. Hive

Apache Spark

Hive

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Apache Spark	Score 8.9 out of 10	N/A	Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.	N/A
Hive	Score 9.0 out of 10	N/A	Hive Technology offers their eponymous project management and process management application, providing integrations with many popularly used applications for productivity, cloud storage, and collaboration.	$0

Pricing

Apache Spark

Hive

Editions & Modules

No answers on this topic

Free: $0
Lite: $24
per month per user
Growth: $34
per month per user
Pro: $59
per month per user
Elite: Contact Sales

Offerings

Pricing Offerings
Apache Spark	Hive
Free Trial
No	Yes
Free/Freemium Version
No	Yes
Premium Consulting/Integration Services
No	No

Entry-level Setup Fee

No setup fee

Additional Details

—

A discount is offered for annual pricing.

More Pricing Information

Community Pulse
	Apache Spark	Hive
Considered Both Products	Apache Spark Ananth Gouri Assistant Professor Chose Apache Spark We used Surprise Kit for one of the other research works. It is more fine-tuned to Recommendation systems and their algorithms. Apache Spark has MLlib for majority of ML problems. Where as software like Surprse Kit - it suitable for a specific task of Recommendations only. Incentivized Helpful? Riyaz Khan Staff Engineer Chose Apache Spark Apache Spark is a fast-processing in-memory computing framework. It is 10 times faster than Apache Hadoop. Earlier we were using Apache Hadoop for processing data on the disk but now we are shifted to Apache Spark because of its in-memory computation capability. Also in SAP … Incentivized Helpful? Steven Li Senior Software Developer (Consultant) Chose Apache Spark Other teams used to work on Apache Hadoop but our team started with Apache Spark directly. Incentivized Helpful? Verified User Anonymous Chose Apache Spark There are a few alternatives that can do the same transformation and aggregation like Apache Spark can do but most of them are not able to perform parallel computation. For example, pandas is a really good tool to do that but not parallelized; However, there are some tools that … Incentivized Helpful? Surendranatha Reddy Chappidi Senior Data Engineer Chose Apache Spark Apache Spark works in distributed mode using cluster Informatica and Datastage cannot scale horizontally We can write custom code in spark, whereas in Datastage and Informatica we can only choose the different features proivided already. Incentivized Helpful? Verified User Anonymous Chose Apache Spark Apache Spark has much more better performance and features if we compare with Hive or map/reduce kind of solutions. Spark has many other features for machine learning, streaming. Incentivized Helpful? Chetan Munegowda Software Engineer Chose Apache Spark Spark is simply awesome to work on with any data sets and also has an in-memory database which makes it very flexible. Incentivized Helpful? YM Yogesh Mhasde Technical Manager Chose Apache Spark 1. Apache Spark is almost 100 % faster than Hadoop. 2. Apache Spark is more stable than Amazon EMR. 3. The end to end distributed machine library is more robust in Apache Spark. Incentivized Helpful? Verified User Anonymous Chose Apache Spark Databricks uses Spark as a foundation, and is also a great platform. It does bring several add-ons, which we did not feel needed by the time we evaluated - and haven't needed since then. One interesting plus in our opinion was the engineering support, which is great depending … Incentivized Helpful? Verified User Anonymous Chose Apache Spark It is easy to learn, read and to maintain. It brings the best of the Ruby on Rails framework from Java that helps to create a web service so easily. Communication is one of the most distinctive features of Apache Spark compared to alternative products. You are able to … Incentivized Helpful? SS Shiv Shivakumar Acquisitions Leader Chose Apache Spark We evaluated SAS alongside with Apache Spark but during the course of proof of concept found that Apache Spark was able to support the hadoop eco-system and hadoop file system much better. It was much faster at that time while having the ability to process data quickly for the … Incentivized Helpful? Carla Borges Consultor Tecnico - Java Developer and Php Developer. Chose Apache Spark I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the … Incentivized Helpful? Nitin Pasumarthy Software Engineer Chose Apache Spark All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional … Incentivized Helpful? Kartik Chavan Data Analyst Chose Apache Spark Even with Python, MapReduce is lengthy coding. Combination of Python with Apache Spark will not only shorten the code, but it will effectively increase the speed of algorithms. Occasionally, I use MapReduce, but Apache Spark will replace MapReduce very soon. It has many … Incentivized Helpful? Anson Abraham Data Czar Chose Apache Spark vs MapRedce, it was faster and easier to manage. Especially for Machine Learning, where MapReduce is lacking. Also Apache Storm was slower and didn't scale as much as Spark does. Spark elasticity was easier to apply compared to storm and MapReduce. managing resources for … Incentivized Helpful? Verified User Anonymous Chose Apache Spark We specifically choose Spark over MapReduce to make the cluster processing faster Incentivized Helpful? Verified User Anonymous Chose Apache Spark Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and … Incentivized Helpful? Kamesh Emani Software Developer Intern Chose Apache Spark Apache Pig and Apache Hive provide most of the things spark provide but apache spark has more features like actions and transformations which are easy to code. Spark uses optimization technique as we can select driver program and manipulate DAG (Directed Acyclic Graph) Python … Incentivized Helpful? Verified User Anonymous Chose Apache Spark There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of … Incentivized Helpful? Jordan Moore Staff Consultant Chose Apache Spark Spark has primarily replaced my use of writing pure Hadoop MapReduce or Apache Pig jobs for processing data. I like the fact that I can alternate between the main programming languages that I know - Java and Python - and use those to learn the Scala API. Spark also can be … Incentivized Helpful?	Hive Leonardo Nery Project Analyst Chose Hive In my experience, Hive is better for the work and not so good for financial. That's why we use monday. Incentivized Helpful? Verified User Anonymous Chose Hive One key difference between Hive and Spark is the way they process data. Hive is a batch-oriented system, which means that it is designed to process large amounts of data in a batch mode rather than in real-time. In contrast, Spark is a real-time processing platform that is … Incentivized Helpful? Verified User Anonymous Chose Hive Hive's layout was much smoother and nicer to use. Incentivized Helpful? MM Madison Maione SVP Chose Hive More user-friendly. Able to quickly get users adopted and utilizing the platform versus Planview. More intuitive, especially for the user that is not familiar with project management software. This platform was built for everyday users. Incentivized Helpful? VN Vicky Negi Lead UI/UX Chose Hive Hive is a bit different than Jira and Monday, which I used mostly. Overall does a great job managing project and helps with team communication. Removes dependency of asking team members for updates by going to conference rooms. With Hive, the team updates the status, and we … Incentivized Helpful? Verified User Anonymous Chose Hive Hive did what these other tools do. It has Kanban boards, Gantt views, timeline views, reporting, task management, and file uploads. While it is not as feature rich at the lowest subscription level as some of these others, its interface is quite a bit less overwhelming than say … Incentivized Helpful? Verified User Anonymous Chose Hive Easier to deploy Better UI/UX Easy to customize Incentivized Helpful? Verified User Anonymous Chose Hive Hive for me felt more complex and granular in comparison to other competitors which was a good thing. I enjoyed the layout of viewing projects, the way it integrated timesheets, resourcing, and budgets together, and worked really well to help track episodes and projects. For … Incentivized Helpful? María José Nieto Hidalgo Digital Consultant Chose Hive It's an all in one, amazing and simple. I've been looking for something like this all along. Incentivized Helpful? Verified User Anonymous Chose Hive So far Hive is the total package for our needs. Offering request forms and proofing/approval out of the box without third party integrations has been a huge upgrade for us along with incredibly reasonable pricing. The support for onboarding has been fantastic and we haven't … Incentivized Helpful? John Bianchi Director of Accounts Chose Hive I would say that in comparison to Asana, Hive is a better interface an UI. I think Asana is more robust in terms of what it can do in conjunction with Confluence but I think Hive is a better entry-level model for new employees. Hive is much simpler and more straight forward and … Incentivized Helpful? Tera Ogorzalek Director of Marketing Chose Hive I prefer Hive because we are able to connect and share with our local network. Incentivized Helpful? Cameron Michael Rhoads Content Supervisor Chose Hive I like Hive better than Trello. Hive is definitely more user-friendly, but Trello had nice shortcuts that I miss in Hive. I would like to organize my board with just one click. Incentivized Helpful?

Features

Apache Spark

Hive

Project Management

Comparison of Project Management features of Product A and Product B
	Apache Spark - Ratings	Hive 9.1 Ratings 17% above category average
Task Management	00 Ratings	9.00 Ratings
Resource Management	00 Ratings	9.00 Ratings
Gantt Charts	00 Ratings	10.00 Ratings
Scheduling	00 Ratings	7.00 Ratings
Workflow Automation	00 Ratings	9.00 Ratings
Team Collaboration	00 Ratings	10.00 Ratings
Support for Agile Methodology	00 Ratings	10.00 Ratings
Support for Waterfall Methodology	00 Ratings	8.00 Ratings
Document Management	00 Ratings	10.00 Ratings
Email integration	00 Ratings	10.00 Ratings
Mobile Access	00 Ratings	8.00 Ratings
Timesheet Tracking	00 Ratings	10.00 Ratings
Change request and Case Management	00 Ratings	10.00 Ratings
Budget and Expense Management	00 Ratings	7.00 Ratings

Professional Services Automation

Comparison of Professional Services Automation features of Product A and Product B
	Apache Spark - Ratings	Hive 7.0 Ratings 10% below category average
Quotes/estimates	00 Ratings	7.00 Ratings
Invoicing	00 Ratings	7.00 Ratings
Project & financial reporting	00 Ratings	7.00 Ratings
Integration with accounting software	00 Ratings	7.00 Ratings

Best Alternatives
	Apache Spark	Hive
Small Businesses	No answers on this topic	Stackby Score 8.9 out of 10
Medium-sized Companies	Cloudera Manager Score 9.9 out of 10	InEight Score 8.2 out of 10
Enterprises	IBM Analytics Engine Score 8.6 out of 10	InEight Score 8.2 out of 10
All Alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Spark	Hive
Likelihood to Recommend	9.0 (0 ratings)	9.0 (0 ratings)
Likelihood to Renew	10.0 (0 ratings)	- (0 ratings)
Usability	8.0 (0 ratings)	8.0 (0 ratings)
Support Rating	8.7 (0 ratings)	9.4 (0 ratings)

User Testimonials
	Apache Spark	Hive
Likelihood to Recommend	Apache Spark has rich APIs for regular data transformations or for ML workloads or for graph workloads, whereas other systems may not such a wide range of support. Choose it when you need to perform data transformations for big data as offline jobs, whereas use MongoDB-like distributed database systems for more realtime queries. Incentivized Nitin Pasumarthy Software Engineer Read full review	Hive is great for managing projects with your team. Assigning tasks is simple enough using Hive. It helps manage team goals for the projects. We are able to create reports (via the dashboard) for the progress and updates to provide to the team based on completed stages. Works great for bigger projects. Incentivized VN Vicky Negi Lead UI/UX Read full review
Pros	It performs a conventional disk-based process when the data sets are too large to fit into memory, which is very useful because, regardless of the size of the data, it is always possible to store them. It has great speed and ability to join multiple types of databases and run different types of analysis applications. This functionality is super useful as it reduces work times Apache Spark uses the data storage model of Hadoop and can be integrated with other big data frameworks such as HBase, MongoDB, and Cassandra. This is very useful because it is compatible with multiple frameworks that the company has, and thus allows us to unify all the processes. Incentivized Carla Borges Consultor Tecnico - Java Developer and Php Developer. Read full review	Data warehousing: Hive is often used as a data warehousing platform, allowing users to store and analyze large amounts of structured and semi-structured data. It is especially good at handling data that is too large to be stored and analyzed on a single machine, and supports a wide variety of data formats. Batch processing: Hive is designed for batch processing of large datasets, making it well-suited for tasks such as data ETL (extract, transform, load), data cleansing, and data aggregation. Data transformation: Hive allows users to perform data transformations and manipulations using custom scripts written in Java, Python, or other programming languages. This can be useful for tasks such as data cleansing, data aggregation, and data transformation. Integration with other tools: Hive integrates with a wide variety of other tools and services in the Hadoop ecosystem, such as Pig, Spark, and HBase, allowing users to perform a wide range of data analysis and management tasks. Incentivized Verified User Anonymous Read full review
Cons	Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Incentivized Anson Abraham Data Czar Read full review	Organizing tasks by assignees could be better. It's a little cumbersome to check off each person you want. Can you group these? I don't really use any view besides task view. Is there something better I could be using? It would be nice if attachments showed up in a nicer format, maybe with a preview? Incentivized Cameron Michael Rhoads Content Supervisor Read full review
Likelihood to Renew	Capacity of computing data in cluster and fast speed. Steven Li Senior Software Developer (Consultant) Read full review	No answers on this topic
Usability	If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used Incentivized Verified User Anonymous Read full review	Its a easy tool, the best way to organize the workflow but has room for more improvements. Leonardo Nery Project Analyst Read full review
Support Rating	1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications. YM Yogesh Mhasde Technical Manager Read full review	Our CSR is easily accessible and they have support built into the app itself. They also have a pretty robust support site. We also took advantage of the free trial and learned so much by putting Hive through the paces and figuring out the best way to mold it to our needs. Incentivized Verified User Anonymous Read full review
Alternatives Considered	We used Surprise Kit for one of the other research works. It is more fine-tuned to Recommendation systems and their algorithms. Apache Spark has MLlib for majority of ML problems. Where as software like Surprse Kit - it suitable for a specific task of Recommendations only Incentivized Ananth Gouri Assistant Professor Read full review	One key difference between Hive and Spark is the way they process data. Hive is a batch-oriented system, which means that it is designed to process large amounts of data in a batch mode rather than in real-time. In contrast, Spark is a real-time processing platform that is designed to handle streaming data and support interactive queries. Another difference is the way they execute queries. Hive uses a SQL-like query language called HiveQL, while Spark supports a wide range of languages and APIs, including SQL, Python, Scala, and R. But we chose Hive due to its simple queries on large datasets and for data warehousing tasks. Incentivized Verified User Anonymous Read full review
Return on Investment	Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark. Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy. Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs. Incentivized Verified User Anonymous Read full review	I've gotten to know my colleagues better, knowing their roles makes it faster to contact them to complete tasks and that speed makes us optimize and earn better results The jobs speed made us focus on optimization and customization for the client, and that in a better treatment by the client and better revenue We can understand which tasks takes more time and to stimate better what we can ask for Incentivized María José Nieto Hidalgo Digital Consultant Read full review
ScreenShots		Hive Screenshots