Apache Spark vs. Elasticsearch

Apache Spark

Apache Spark

165 Reviews and Ratings

Elasticsearch

Elasticsearch

221 Reviews and Ratings

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Apache Spark	Score 8.9 out of 10	N/A	Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.	N/A
Elasticsearch	Score 8.5 out of 10	N/A	Elasticsearch is an enterprise search tool from Elastic in Mountain View, California.	$16 per month

Pricing

Apache Spark

Elasticsearch

Editions & Modules

No answers on this topic

Standard: $16.00
per month
Gold: $19.00
per month
Platinum: $22.00
per month
Enterprise: Contact Sales

Offerings

Pricing Offerings
Apache Spark	Elasticsearch
Free Trial
No	No
Free/Freemium Version
No	No
Premium Consulting/Integration Services
No	No

Entry-level Setup Fee

No setup fee

No setup fee

Additional Details

—

—

More Pricing Information

Community Pulse
	Apache Spark	Elasticsearch
Considered Both Products	Apache Spark Ananth Gouri Assistant Professor Chose Apache Spark We used Surprise Kit for one of the other research works. It is more fine-tuned to Recommendation systems and their algorithms. Apache Spark has MLlib for majority of ML problems. Where as software like Surprse Kit - it suitable for a specific task of Recommendations only. Incentivized Helpful? Riyaz Khan Staff Engineer Chose Apache Spark Apache Spark is a fast-processing in-memory computing framework. It is 10 times faster than Apache Hadoop. Earlier we were using Apache Hadoop for processing data on the disk but now we are shifted to Apache Spark because of its in-memory computation capability. Also in SAP … Incentivized Helpful? Steven Li Senior Software Developer (Consultant) Chose Apache Spark Other teams used to work on Apache Hadoop but our team started with Apache Spark directly. Incentivized Helpful? Verified User Anonymous Chose Apache Spark There are a few alternatives that can do the same transformation and aggregation like Apache Spark can do but most of them are not able to perform parallel computation. For example, pandas is a really good tool to do that but not parallelized; However, there are some tools that … Incentivized Helpful? Surendranatha Reddy Chappidi Senior Data Engineer Chose Apache Spark Apache Spark works in distributed mode using cluster Informatica and Datastage cannot scale horizontally We can write custom code in spark, whereas in Datastage and Informatica we can only choose the different features proivided already. Incentivized Helpful? Verified User Anonymous Chose Apache Spark Apache Spark has much more better performance and features if we compare with Hive or map/reduce kind of solutions. Spark has many other features for machine learning, streaming. Incentivized Helpful? Chetan Munegowda Software Engineer Chose Apache Spark Spark is simply awesome to work on with any data sets and also has an in-memory database which makes it very flexible. Incentivized Helpful? YM Yogesh Mhasde Technical Manager Chose Apache Spark 1. Apache Spark is almost 100 % faster than Hadoop. 2. Apache Spark is more stable than Amazon EMR. 3. The end to end distributed machine library is more robust in Apache Spark. Incentivized Helpful? Verified User Anonymous Chose Apache Spark Databricks uses Spark as a foundation, and is also a great platform. It does bring several add-ons, which we did not feel needed by the time we evaluated - and haven't needed since then. One interesting plus in our opinion was the engineering support, which is great depending … Incentivized Helpful? Verified User Anonymous Chose Apache Spark It is easy to learn, read and to maintain. It brings the best of the Ruby on Rails framework from Java that helps to create a web service so easily. Communication is one of the most distinctive features of Apache Spark compared to alternative products. You are able to … Incentivized Helpful? SS Shiv Shivakumar Acquisitions Leader Chose Apache Spark We evaluated SAS alongside with Apache Spark but during the course of proof of concept found that Apache Spark was able to support the hadoop eco-system and hadoop file system much better. It was much faster at that time while having the ability to process data quickly for the … Incentivized Helpful? Carla Borges Consultor Tecnico - Java Developer and Php Developer. Chose Apache Spark I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the … Incentivized Helpful? Nitin Pasumarthy Software Engineer Chose Apache Spark All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional … Incentivized Helpful? Kartik Chavan Data Analyst Chose Apache Spark Even with Python, MapReduce is lengthy coding. Combination of Python with Apache Spark will not only shorten the code, but it will effectively increase the speed of algorithms. Occasionally, I use MapReduce, but Apache Spark will replace MapReduce very soon. It has many … Incentivized Helpful? Anson Abraham Data Czar Chose Apache Spark vs MapRedce, it was faster and easier to manage. Especially for Machine Learning, where MapReduce is lacking. Also Apache Storm was slower and didn't scale as much as Spark does. Spark elasticity was easier to apply compared to storm and MapReduce. managing resources for … Incentivized Helpful? Verified User Anonymous Chose Apache Spark We specifically choose Spark over MapReduce to make the cluster processing faster Incentivized Helpful? Verified User Anonymous Chose Apache Spark Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and … Incentivized Helpful? Kamesh Emani Software Developer Intern Chose Apache Spark Apache Pig and Apache Hive provide most of the things spark provide but apache spark has more features like actions and transformations which are easy to code. Spark uses optimization technique as we can select driver program and manipulate DAG (Directed Acyclic Graph) Python … Incentivized Helpful? Verified User Anonymous Chose Apache Spark There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of … Incentivized Helpful? Jordan Moore Staff Consultant Chose Apache Spark Spark has primarily replaced my use of writing pure Hadoop MapReduce or Apache Pig jobs for processing data. I like the fact that I can alternate between the main programming languages that I know - Java and Python - and use those to learn the Scala API. Spark also can be … Incentivized Helpful?	Elasticsearch Verified User Anonymous Chose Elasticsearch Elasticsearch has a steep learning curve, but it is the best in terms of customization and use cases it can cover most of the business needs. The other tools might be easier to integrate with and start seeing results, but you will end up having issues when you need customized … Incentivized Helpful? Julie Zhong Data analytics Chose Elasticsearch Elasticsearch is relatedly cheaper the splunk. Opensearch is good and we migrated some data into it but the critical data stays in elasticsearch as it has formal support. Incentivized Helpful? John Anderson Lead Application Engineer IV Chose Elasticsearch They all have their specific pros and cons. Elastic was actually initially brought in to provide less expensive functionality to Splunk, and Splunk use cases. Grafana was brought in to provide less expensive visualizations compared to Splunk and Elastic...I would recommend … Incentivized Helpful? Borislav Traykov DevOps Team Leader Chose Elasticsearch Elasticsearch is the most well-known and supported free data platform that we identified. We are taking advantage of community knowledge and practices. In terms of flexibility and breadth of use cases no other competitor came close to Elasticsearch. We've tried Solr in the past … Incentivized Helpful? Oscar Narváez Del Rio Tools & Analytic Monitoring Leader Chose Elasticsearch Elasticsearch brings the capacity to grow data ingest and provides 24/7 visibility into critical services across IT and Business teams. With Elasticsarch, specialized support teams can easily view all the relevant information by using real-time dashboards, and can immediately … Incentivized Helpful? Keith Lubell Chief Technology Officer Chose Elasticsearch Elasticsearch and Solr are both based on Lucene, but the user community for Elasticsearch is much stronger, and setting up a cluster is easier. Splunk is very well suited for Log indexing and searching but is not nearly as flexible as Elasticsearch. Couchbase is a great NoSQL … Incentivized Helpful? SN Swastik Nath Data Scientist Chose Elasticsearch Search and analytics capabilities of Elasticsearch are superior to its competitors. Being open source, it is a cheaper and faster solution than other competitors. Installation is straightforward and it can be potentially deployed anywhere and everywhere! There is no need for … Incentivized Helpful? Verified User Anonymous Chose Elasticsearch Faster, better, more efficient. There was no comparison in Elasticsearch vs LEM. AlienVault was decent but too expensive for what it does compared to Elastic. The only competitor I'd consider as in the same ballpark in the SIEM world is Splunk. Save yourself the money and get a … Incentivized Helpful? Verified User Anonymous Chose Elasticsearch I think Elasticseach works less great compared to Splunk. Mainly the way the Splunk search head works is vastly superior to the way the Elasticsearch query language works. Furthermore, the Splunk architecture is in my opinion easier to roll out and scale-up. Splunk also has a … Incentivized Helpful? Maria Sousa Managing Engineer Chose Elasticsearch Elasticsearch is very well packed in a broad set of features, ranging from customization capabilities to security and add-ons, and also comes with a great visualization tool named Kibana. Most of the competitors are strong in some of these areas, but I know of no other that's … Incentivized Helpful? Verified User Anonymous Chose Elasticsearch Amazon CloudWatch Incentivized Helpful? Mark Freeman, MBA Director Enterprise Architectures Chose Elasticsearch SharePoint seems antiquated. Amazon Elastic File System is hard to find things. Google search appliance does ranking poorly. Incentivized Helpful? Verified User Anonymous Chose Elasticsearch Elasticsearch is more expensive, especially on disk storage. In terms of functionality and ease of use, it's better than most solutions out there. Incentivized Helpful? Verified User Anonymous Chose Elasticsearch Almost no one uses Solr anymore--most have migrated to Elasticsearch. I've never tried it myself but I heard Solr is much more difficult to configure and because it doesn't use a REST API, it locks you into Java and XML. XML--ick! Lucene: Elasticsearch is built using Lucene … Incentivized Helpful? Erlon Sousa Pinheiro Senior Devops Engineer Chose Elasticsearch From my perspective, there is nothing currently on the marker better than Datadog, but unfortunately, that's a pricey product, Elasticsearch deliver us part of Datadog functionalities being cheaper. Fluentd as a service (provided by the company behind Fluentd) looks like a … Incentivized Helpful? Gary Davis Director Chose Elasticsearch Previously, we used Microsoft SQL Server's full-text search. Elasticsearch is faster and that includes searching and indexing and re-indexing the catalog of products. Incentivized Helpful? Gedson Silva Senior Production Engineer Chose Elasticsearch Elasticsearch is much easier to set up and maintain. It provides better distributed architecture and fault tolerance, and is much faster searching. Incentivized Helpful? Jose Adan Ortiz Sales Engineer Chose Elasticsearch With Elasticsearch you can integrate a lot of data sources. It can act as a small DataLake where you can put different kinds of data and extract important insights. With Splunk, additional to elevated costs of licensing and hardware, you need to have expert engineers to address … Incentivized Helpful? Verified User Anonymous Chose Elasticsearch All database systems have things they are good at, and things they aren't as good at. Riak/SOLR is great as a K/V store, but SOLR cannot handle requests as fast as ElasticSearch. In fact, SOLR is the reason we had to migrate to ElasticSearch. Redis is great at SET operations … Incentivized Helpful? Verified User Anonymous Chose Elasticsearch ES does not compete with the above packages but compliments them. By automating and mining logs, you are able to get a sense of the business process, marketing data or whatever else you need to capture and mine. The potential energy stored within Elasticsearch makes it a great … Incentivized Helpful? Verified User Anonymous Chose Elasticsearch Elasticsearch is the most powerful and easy to use platform in this market. It's open source which makes enhancements very possible and also makes customization something that is commonplace. We're able to create custom modules to pull data from both log and config files, which … Incentivized Helpful? Verified User Anonymous Chose Elasticsearch As far as we are concerned, Elasticsearch is the gold standard and we have barely evaluated any alternatives. You could consider it an alternative to a relational or NoSQL database, so in cases where those suffice, you don't need Elasticsearch. But if you want powerful … Incentivized Helpful? Anatoly Geyfman Founder & CEO Chose Elasticsearch When we first evaluated Elasticsearch, we compared it with alternatives like traditional RDBMS products (Postgres, MySQL) as well as other noSQL solutions like Cassandra & MongoDB. For our use case, Elasticsearch delivered on two fronts. First, we got a world-class search … Incentivized Helpful? Tarun Mangukiya Co-Founder Chose Elasticsearch We've not tried any other products as Elasticsearch is the best fit for our requirements. Incentivized Helpful? Josh Kramer Senior Software Engineer Chose Elasticsearch Power and simplicity along with performance. Incentivized Helpful?

Best Alternatives
	Apache Spark	Elasticsearch
Small Businesses	No answers on this topic	Yext Score 7.4 out of 10
Medium-sized Companies	Cloudera Manager Score 9.9 out of 10	Guru Score 9.3 out of 10
Enterprises	IBM Analytics Engine Score 8.6 out of 10	Guru Score 9.3 out of 10
All Alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Spark	Elasticsearch
Likelihood to Recommend	9.0 (0 ratings)	9.0 (0 ratings)
Likelihood to Renew	10.0 (0 ratings)	10.0 (0 ratings)
Usability	8.0 (0 ratings)	10.0 (0 ratings)
Support Rating	8.7 (0 ratings)	7.8 (0 ratings)
Implementation Rating	- (0 ratings)	9.0 (0 ratings)

User Testimonials
	Apache Spark	Elasticsearch
Likelihood to Recommend	Apache Spark has rich APIs for regular data transformations or for ML workloads or for graph workloads, whereas other systems may not such a wide range of support. Choose it when you need to perform data transformations for big data as offline jobs, whereas use MongoDB-like distributed database systems for more realtime queries. Incentivized Nitin Pasumarthy Software Engineer Read full review	Elasticsearch is really well suited for searching text (Natural Language Processing) and you can fine tune the searches and scoring very well. I like the ability to find Significant Terms in the Index, where you can find aggregations that are really relevant to a specific search. It also allows for queries to lead to new queries via aggregations which is great for navigating your data. It is less suited to doing more complex aggregations where slices of data are required to be processing using guassian normalizations. And doing searches which join different documents is very very hard, and requires serious thought on how to denormalize data. Incentivized Keith Lubell Chief Technology Officer Read full review
Pros	It performs a conventional disk-based process when the data sets are too large to fit into memory, which is very useful because, regardless of the size of the data, it is always possible to store them. It has great speed and ability to join multiple types of databases and run different types of analysis applications. This functionality is super useful as it reduces work times Apache Spark uses the data storage model of Hadoop and can be integrated with other big data frameworks such as HBase, MongoDB, and Cassandra. This is very useful because it is compatible with multiple frameworks that the company has, and thus allows us to unify all the processes. Incentivized Carla Borges Consultor Tecnico - Java Developer and Php Developer. Read full review	Super-fast search on millions of documents. We've got over 2 billion documents in our index and the retrieve speeds are still in the < 1-second range. Analytics on top of your search. If you organize your data appropriately, Elasticsearch can serve as a distributed OLAP system Elasticsearch is great for geographic data as well, including searching and filtering with geojson, and a variety of geospatial algorithms. Incentivized Anatoly Geyfman Founder & CEO Read full review
Cons	Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Incentivized Anson Abraham Data Czar Read full review	Setting Java memory thresholds can be a pain for those not accustomed to things like Eden Space & Old Generation which can lead to over allocation, or more likely, under allocation. Apache Solr had a similar issue. It would be nice if the program would take an extra step and dogfood it's own advice by analyzing the system & processes to return a solid recommendation for that configuration. The proper configuration information is outlined in the documentation, it would be nice if that was automated. The only health check that ElasticSearch reports back is a "red" status without any real solid information about what is going on, though its usually memory thresholds or disk I/O. I am currently on ElasticSearch 1.5 so that may have changed for newer versions. When the status goes "red", I as the administrator of the software, feel like I lose control of whats going on which should rarely happen. Something more verbose would eliminate that. This is more of a critique of the ElasticStack in general. The whole top to bottom stack is starting to get feature creep with things that are better suited in other software and increasing the barrier for entry for people to get started with setting up a robust logging infrastructure. ElasticSearch as a storage search engine, is pretty streamlined, but I can see that the tools that comprise the ELK Stack are going to require a certification with constant study at some point. During major release for Logstash a while back, it literally took a month to learn a new language because Elastic completely changed the syntax. For a medium sized organization of only a couple of admins, that is a pretty high bar where time is money. They really should work on refining/automating the tools & search engine they have, instead of shoehorning/changing things on to an already rock solid foundation. Incentivized Colby Shores DevOps Engineer Read full review
Likelihood to Renew	Capacity of computing data in cluster and fast speed. Steven Li Senior Software Developer (Consultant) Read full review	We're pretty heavily invested in ElasticSearch at this point, and there aren't any obvious negatives that would make us reconsider this decision. Incentivized Aaron Gussman Senior Technologist Read full review
Usability	If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used Incentivized Verified User Anonymous Read full review	To get started with Elasticsearch, you don't have to get very involved in configuring what really is an incredibly complex system under the hood. You simply install the package, run the service, and you're immediately able to begin using it. You don't need to learn any sort of query language to add data to Elasticsearch or perform some basic searching. If you're used to any sort of RESTful API, getting started with Elasticsearch is a breeze. If you've never interacted with a RESTful API directly, the journey may be a little more bumpy. Overall, though, it's incredibly simple to use for what it's doing under the covers. Incentivized Verified User Anonymous Read full review
Support Rating	1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications. YM Yogesh Mhasde Technical Manager Read full review	We've only used it as an opensource tooling. We did not purchase any additional support to roll out the elasticsearch software. When rolling out the application on our platform we've used the documentation which was available online. During our test phases we did not experience any bugs or issues so we did not rely on support at all. Incentivized Verified User Anonymous Read full review
Implementation Rating	No answers on this topic	Do not mix data and master roles. Dedicate at least 3 nodes just for Master Incentivized Verified User Anonymous Read full review
Alternatives Considered	We used Surprise Kit for one of the other research works. It is more fine-tuned to Recommendation systems and their algorithms. Apache Spark has MLlib for majority of ML problems. Where as software like Surprse Kit - it suitable for a specific task of Recommendations only Incentivized Ananth Gouri Assistant Professor Read full review	Elasticsearch is the most well-known and supported free data platform that we identified. We are taking advantage of community knowledge and practices. In terms of flexibility and breadth of use cases no other competitor came close to Elasticsearch. We've tried Solr in the past be we encountered issues which were deal-breaking for us. MongoDB - it just did not pass our evaluation parameters as a main data platform. We still use it for smaller purposes, though. Incentivized Borislav Traykov DevOps Team Leader Read full review
Return on Investment	Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark. Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy. Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs. Incentivized Verified User Anonymous Read full review	I am not in finance and I suspect even if I was this would be hard to measure. But for sure, Elasticsearch has enabled us to have the most flexible data model in the industry for our customer's data, and in doing so we have attracted many many technical customers and got much of their $$$. One problem with Elasticsearch is that because it runs on the JVM, there can be some stop-the-world JVM garbage collections happening that can take down nodes and reduce indexing speed. The solution for that tends to be "let's just upgrade the CPU on that machine". And before you know it you are paying $$$ because this'll happen with 40+ machines. On the other hand, I do think that ES is more efficient than other systems and so it requires fewer nodes to keep it highly tolerant and available, so we probably saved some money that way. Incentivized Verified User Anonymous Read full review
ScreenShots