Apache Spark vs. Elasticsearch vs. Amazon Redshift

Apache Spark

Apache Spark

165 Reviews and Ratings

Elasticsearch

Elasticsearch

217 Reviews and Ratings

Amazon Redshift

Amazon Redshift

218 Reviews and Ratings

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Apache Spark	Score 9.0 out of 10	N/A	Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.	N/A
Elasticsearch	Score 8.7 out of 10	N/A	Elasticsearch is an enterprise search tool from Elastic in Mountain View, California.	$16 per month
Amazon Redshift	Score 8.9 out of 10	N/A	Amazon Redshift is a hosted data warehouse solution, from Amazon Web Services.	$0.24 per GB per month

Pricing

Apache Spark

Elasticsearch

Amazon Redshift

Editions & Modules

No answers on this topic

Standard: $16.00
per month
Gold: $19.00
per month
Platinum: $22.00
per month
Enterprise: Contact Sales

Redshift Managed Storage: $0.24
per GB per month
Current Generation: $0.25 - $13.04
per hour
Previous Generation: $0.25 - $4.08
per hour
Redshift Spectrum: $5.00
per terabyte of data scanned

Offerings

Pricing Offerings
Apache Spark	Elasticsearch	Amazon Redshift
Free Trial
No	No	No
Free/Freemium Version
No	No	No
Premium Consulting/Integration Services
No	No	No

Entry-level Setup Fee

No setup fee

No setup fee

No setup fee

Additional Details

—

—

—

More Pricing Information

Community Pulse
	Apache Spark	Elasticsearch	Amazon Redshift
Considered Multiple Products	Apache Spark Verified User Director Chose Apache Spark There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of … Incentivized Helpful?	Elasticsearch John Anderson Lead Application Engineer IV Chose Elasticsearch They all have their specific pros and cons. Elastic was actually initially brought in to provide less expensive functionality to Splunk, and Splunk use cases. Grafana was brought in to provide less expensive visualizations compared to Splunk and Elastic...I would recommend … Incentivized Helpful? Josh Kramer Senior Software Engineer Chose Elasticsearch Power and simplicity along with performance. Incentivized Helpful?	Amazon Redshift Dileep Kumar Principal Data Scientist Chose Amazon Redshift Biggest advantage of Amazon Redshift is it's part of the aws ecosystem. When tuned well it is also very cheap compared to something like Snowflake. And compared to spark or databricks, Amazon Redshift is a solid warehouse that's well suited for tabular data. We use it for user … Incentivized Helpful? Tamás Imre Lead Analyst Chose Amazon Redshift Prezi is using AWS so Amazon Redshift was the obvious choice. It is fast, scalable and easy to use. Supplemented with Spark and Hive I'm completely satisfied using Redhsift. Sometimes I miss commands I used earlier on MS and Oracle SQL and the lack of procedural features is … Incentivized Helpful? Verified User Director Chose Amazon Redshift Than Vertica: Redshift is cheaper and AWS integrated (which was a plus because the whole company was on AWS). Than BigQuery: Redshift has a standard SQL interface, though recently I heard good things about BigQuery and would try it out again. Incentivized Helpful?

Best Alternatives
	Apache Spark	Elasticsearch	Amazon Redshift
Small Businesses	No answers on this topic	Yext Score 8.0 out of 10	Google BigQuery Score 8.8 out of 10
Medium-sized Companies	Cloudera Manager Score 9.9 out of 10	Guru Score 9.6 out of 10	Snowflake Score 8.7 out of 10
Enterprises	IBM Analytics Engine Score 7.2 out of 10	Guru Score 9.6 out of 10	Snowflake Score 8.7 out of 10
All Alternatives	View all alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Spark	Elasticsearch	Amazon Redshift
Likelihood to Recommend	9.0 (24 ratings)	9.0 (48 ratings)	9.0 (38 ratings)
Likelihood to Renew	10.0 (1 ratings)	10.0 (1 ratings)	- (0 ratings)
Usability	8.0 (4 ratings)	10.0 (1 ratings)	9.0 (10 ratings)
Support Rating	8.7 (4 ratings)	7.8 (9 ratings)	9.0 (7 ratings)
Implementation Rating	- (0 ratings)	9.0 (1 ratings)	- (0 ratings)
Contract Terms and Pricing Model	- (0 ratings)	- (0 ratings)	10.0 (1 ratings)

User Testimonials
	Apache Spark	Elasticsearch	Amazon Redshift
Likelihood to Recommend	Apache Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible. Incentivized Ananth Gouri Assistant Professor Read full review	Elastic Elasticsearch is a really scalable solution that can fit a lot of needs, but the bigger and/or those needs become, the more understanding & infrastructure you will need for your instance to be running correctly. Elasticsearch is not problem-free - you can get yourself in a lot of trouble if you are not following good practices and/or if are not managing the cluster correctly. Licensing is a big decision point here as Elasticsearch is a middleware component - be sure to read the licensing agreement of the version you want to try before you commit to it. Same goes for long-term support - be sure to keep yourself in the know for this aspect you may end up stuck with an unpatched version for years. Incentivized Borislav Traykov DevOps Team Leader Read full review	Amazon AWS If the number of connections is expected to be low, but the amounts of data are large or projected to grow it is a good solutions especially if there is previous exposure to PostgreSQL. Speaking of Postgres, Redshift is based on several versions old releases of PostgreSQL so the developers would not be able to take advantage of some of the newer SQL language features. The queries need some fine-tuning still, indexing is not provided, but playing with sorting keys becomes necessary. Lastly, there is no notion of the Primary Key in Redshift so the business must be prepared to explain why duplication occurred (must be vigilant for) Incentivized Arthur Zubarev Senior Business Intelligence Consultant Read full review
Pros	Apache Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues Faster in execution times compare to Hadoop and PIG Latin Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner Interoperability between SQL and Scala / Python style of munging data Incentivized Nitin Pasumarthy Software Engineer Read full review	Elastic As I mentioned before, Elasticsearch's flexible data model is unparalleled. You can nest fields as deeply as you want, have as many fields as you want, but whatever you want in those fields (as long as it stays the same type), and all of it will be searchable and you don't need to even declare a schema beforehand! Elastic, the company behind Elasticsearch, is super strong financially and they have a great team of devs and product managers working on Elasticsearch. When I first started using ES 3 years ago, I was 90% impressed and knew it would be a good fit. 3 years later, I am 200% impressed and blown away by how far it has come and gotten even better. If there are features that are missing or you don't think it's fast enough right now, I bet it'll be suitable next year because the team behind it is so dang fast! Elasticsearch is really, really stable. It takes a lot to bring down a cluster. It's self-balancing algorithms, leader-election system, self-healing properties are state of the art. We've never seen network failures or hard-drive corruption or CPU bugs bring down an ES cluster. Incentivized Verified User Anonymous Read full review	Amazon AWS [Amazon] Redshift has Distribution Keys. If you correctly define them on your tables, it improves Query performance. For instance, we can define Mapping/Meta-data tables with Distribution-All Key, so that it gets replicated across all the nodes, for fast joins and fast query results. [Amazon] Redshift has Sort Keys. If you correctly define them on your tables along with above Distribution Keys, it further improves your Query performance. It also has Composite Sort Keys and Interleaved Sort Keys, to support various use cases [Amazon] Redshift is forked out of PostgreSQL DB, and then AWS added "MPP" (Massively Parallel Processing) and "Column Oriented" concepts to it, to make it a powerful data store. [Amazon] Redshift has "Analyze" operation that could be performed on tables, which will update the stats of the table in leader node. This is sort of a ledger about which data is stored in which node and which partition with in a node. Up to date stats improves Query performance. Incentivized NM Narayan Motamarri Staff Data Engineer Read full review
Cons	Apache Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Incentivized Anson Abraham Data Czar Read full review	Elastic Joining data requires duplicate de-normalized documents that make parent child relationships. It is hard and requires a lot of synchronizations Tracking errors in the data in the logs can be hard, and sometimes recurring errors blow up the error logs Schema changes require complete reindexing of an index Incentivized Keith Lubell Chief Technology Officer Read full review	Amazon AWS We've experienced some problems with hanging queries on Redshift Spectrum/external tables. We've had to roll back to and old version of Redshift while we wait for AWS to provide a patch. Redshift's dialect is most similar to that of PostgreSQL 8. It lacks many modern features and data types. Constraints are not enforced. We must rely on other means to verify the integrity of transformed tables. Incentivized Gavin Hackeling Data Scientist Read full review
Likelihood to Renew	Apache Capacity of computing data in cluster and fast speed. Steven Li Senior Software Developer (Consultant) Read full review	Elastic We're pretty heavily invested in ElasticSearch at this point, and there aren't any obvious negatives that would make us reconsider this decision. Incentivized Aaron Gussman Senior Technologist Read full review	Amazon AWS No answers on this topic
Usability	Apache If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used Incentivized Verified User Anonymous Read full review	Elastic To get started with Elasticsearch, you don't have to get very involved in configuring what really is an incredibly complex system under the hood. You simply install the package, run the service, and you're immediately able to begin using it. You don't need to learn any sort of query language to add data to Elasticsearch or perform some basic searching. If you're used to any sort of RESTful API, getting started with Elasticsearch is a breeze. If you've never interacted with a RESTful API directly, the journey may be a little more bumpy. Overall, though, it's incredibly simple to use for what it's doing under the covers. Incentivized Verified User Anonymous Read full review	Amazon AWS Just very happy with the product, it fits our needs perfectly. Amazon pioneered the cloud and we have had a positive experience using RedShift. Really cool to be able to see your data housed and to be able to query and perform administrative tasks with ease. Incentivized Brendan McKenna Senior Developer Read full review
Support Rating	Apache 1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications. YM Yogesh Mhasde Technical Manager Read full review	Elastic We've only used it as an opensource tooling. We did not purchase any additional support to roll out the elasticsearch software. When rolling out the application on our platform we've used the documentation which was available online. During our test phases we did not experience any bugs or issues so we did not rely on support at all. Incentivized Verified User Anonymous Read full review	Amazon AWS The support was great and helped us in a timely fashion. We did use a lot of online forums as well, but the official documentation was an ongoing one, and it did take more time for us to look through it. We would have probably chosen a competitor product had it not been for the great support Incentivized Verified User Anonymous Read full review
Implementation Rating	Apache No answers on this topic	Elastic Do not mix data and master roles. Dedicate at least 3 nodes just for Master Incentivized Verified User Anonymous Read full review	Amazon AWS No answers on this topic
Alternatives Considered	Apache Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing. Incentivized Verified User Anonymous Read full review	Elastic As far as we are concerned, Elasticsearch is the gold standard and we have barely evaluated any alternatives. You could consider it an alternative to a relational or NoSQL database, so in cases where those suffice, you don't need Elasticsearch. But if you want powerful text-based search capabilities across large data sets, Elasticsearch is the way to go. Incentivized Verified User Anonymous Read full review	Amazon AWS Than Vertica: Redshift is cheaper and AWS integrated (which was a plus because the whole company was on AWS). Than BigQuery: Redshift has a standard SQL interface, though recently I heard good things about BigQuery and would try it out again. Than Hive: Hive is great if you are in the PB+ range, but latencies tend to be much slower than Redshift and it is not suited for ad-hoc applications. Incentivized Verified User Anonymous Read full review
Contract Terms and Pricing Model	Apache No answers on this topic	Elastic No answers on this topic	Amazon AWS Redshift is relatively cheaper tool but since the pricing is dynamic, there is always a risk of exceeding the cost. Since most of our team is using it as self serve and there is no continuous tracking by a dedicated team, it really needs time & effort on analyst's side to know how much it is going to cost. Incentivized Sameera Srivastava Analytics Lead Read full review
Return on Investment	Apache Business leaders are able to take data driven decisions Business users are able access to data in near real time now . Before using spark, they had to wait for at least 24 hours for data to be available Business is able come up with new product ideas Incentivized Surendranatha Reddy Chappidi Senior Data Engineer Read full review	Elastic We have had great luck with implementing Elasticsearch for our search and analytics use cases. While the operational burden is not minimal, operating a cluster of servers, using a custom query language, writing Elasticsearch-specific bulk insert code, the performance and the relative operational ease of Elasticsearch are unparalleled. We've easily saved hundreds of thousands of dollars implementing Elasticsearch vs. RDBMS vs. other no-SQL solutions for our specific set of problems. Incentivized Anatoly Geyfman Founder & CEO Read full review	Amazon AWS Our company is moving to the AWS infrastructure, and in this context moving the warehouse environments to Redshift sounds logical regardless of the cost. Development organizations have to operate in the Dev/Ops mode where they build and support their apps at the same time. Hard to estimate the overall ROI of moving to Redshift from my position. However, running Redshift seems to be inexpensive compared to all the licensing and hardware costs we had on our RDBMS platform before Redshift. Incentivized Michael Romm Principal Data Architect Read full review
ScreenShots