Apache Cassandra vs. Apache Spark

Apache Cassandra

Apache Cassandra

94 Reviews and Ratings

Apache Spark

159 Reviews and Ratings

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Cassandra	Score 7.8 out of 10	N/A	Cassandra is a no-SQL database from Apache.	N/A
Apache Spark	Score 8.6 out of 10	N/A	N/A	N/A

Pricing

Apache Cassandra

Apache Spark

Editions & Modules

No answers on this topic

No answers on this topic

Offerings

Pricing Offerings
Cassandra	Apache Spark
Free Trial
No	No
Free/Freemium Version
No	No
Premium Consulting/Integration Services
No	No

Entry-level Setup Fee

No setup fee

No setup fee

Additional Details

—

—

More Pricing Information

Community Pulse
	Apache Cassandra	Apache Spark
Considered Both Products	Cassandra Rekha Joshi Staff Software Engineer Chose Apache Cassandra Apache Cassandra has the best of both worlds, it is a Java based NoSQL, linearly scalable, best in class tunable performance across different workloads, fault tolerant, distributed, masterless, time series database. We have used both Apache HBase and MongoDB for some use cases … Incentivized Helpful? David Prinzing Chief Technology Officer Chose Apache Cassandra Four years ago, I needed to choose a web-scale database. Having used relational databases for years (PostgreSQL is my favorite), I needed something that could perform well at scale with no downtime. I considered VoltDB for its in-memory speed, but it's limited in scale. I … Incentivized Helpful?	Apache Spark Carla Borges Consultor Tecnico - Java Developer and Php Developer. Chose Apache Spark I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the … Incentivized Helpful? Anson Abraham Data Czar Chose Apache Spark vs MapRedce, it was faster and easier to manage. Especially for Machine Learning, where MapReduce is lacking. Also Apache Storm was slower and didn't scale as much as Spark does. Spark elasticity was easier to apply compared to storm and MapReduce. managing resources for … Incentivized Helpful?
Top Pros	Pro Data model Pro Query language Pro High availability	Pro Machine learning Pro Data sets Pro Easy to use
Top Cons	Minus Ad hoc Minus Relational database Minus View data	Minus Data visualization Minus Learning curve Minus Amounts of data

Features

Apache Cassandra

Apache Spark

NoSQL Databases

Comparison of NoSQL Databases features of Product A and Product B
	Apache Cassandra 8.0 5 Ratings 9% below category average	Apache Spark - Ratings
Performance	8.55 Ratings	00 Ratings
Availability	8.85 Ratings	00 Ratings
Concurrency	7.65 Ratings	00 Ratings
Security	8.05 Ratings	00 Ratings
Scalability	9.55 Ratings	00 Ratings
Data model flexibility	6.75 Ratings	00 Ratings
Deployment model flexibility	7.05 Ratings	00 Ratings

Best Alternatives
	Apache Cassandra	Apache Spark
Small Businesses	IBM Cloudant Score 8.3 out of 10	No answers on this topic
Medium-sized Companies	IBM Cloudant Score 8.3 out of 10	Cloudera Manager Score 9.7 out of 10
Enterprises	IBM Cloudant Score 8.3 out of 10	IBM Analytics Engine Score 8.8 out of 10
All Alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Cassandra	Apache Spark
Likelihood to Recommend	6.0 (16 ratings)	9.9 (24 ratings)
Likelihood to Renew	8.6 (16 ratings)	10.0 (1 ratings)
Usability	7.0 (1 ratings)	10.0 (3 ratings)
Support Rating	7.0 (1 ratings)	8.7 (4 ratings)
Implementation Rating	7.0 (1 ratings)	- (0 ratings)

User Testimonials
	Apache Cassandra	Apache Spark
Likelihood to Recommend	Apache Apache Cassandra is a NoSQL database and well suited where you need highly available, linearly scalable, tunable consistency and high performance across varying workloads. It has worked well for our use cases, and I shared my experiences to use it effectively at the last Cassandra summit! http://bit.ly/1Ok56TK It is a NoSQL database, finally you can tune it to be strongly consistent and successfully use it as such. However those are not usual patterns, as you negotiate on latency. It works well if you require that. If your use case needs strongly consistent environments with semantics of a relational database or if the use case needs a data warehouse, or if you need NoSQL with ACID transactions, Apache Cassandra may not be the optimum choice. Incentivized Rekha Joshi Staff Software Engineer Read full review	Apache Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible. Incentivized Ananth Gouri Assistant Professor Read full review
Pros	Apache Continuous availability: as a fully distributed database (no master nodes), we can update nodes with rolling restarts and accommodate minor outages without impacting our customer services. Linear scalability: for every unit of compute that you add, you get an equivalent unit of capacity. The same application can scale from a single developer's laptop to a web-scale service with billions of rows in a table. Amazing performance: if you design your data model correctly, bearing in mind the queries you need to answer, you can get answers in milliseconds. Time-series data: Cassandra excels at recording, processing, and retrieving time-series data. It's a simple matter to version everything and simply record what happens, rather than going back and editing things. Then, you can compute things from the recorded history. Incentivized David Prinzing Chief Technology Officer Read full review	Apache Apache Spark makes processing very large data sets possible. It handles these data sets in a fairly quick manner. Apache Spark does a fairly good job implementing machine learning models for larger data sets. Apache Spark seems to be a rapidly advancing software, with the new features making the software ever more straight-forward to use. Incentivized Thomas Young Owner, previous CEO Read full review
Cons	Apache Cassandra runs on the JVM and therefor may require a lot of GC tuning for read/write intensive applications. Requires manual periodic maintenance - for example it is recommended to run a cleanup on a regular basis. There are a lot of knobs and buttons to configure the system. For many cases the default configuration will be sufficient, but if its not - you will need significant ramp up on the inner workings of Cassandra in order to effectively tune it. Incentivized Verified User Anonymous Read full review	Apache Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Incentivized Anson Abraham Data Czar Read full review
Likelihood to Renew	Apache I would recommend Cassandra DB to those who know their use case very well, as well as know how they are going to store and retrieve data. If you need a guarantee in data storage and retrieval, and a DB that can be linearly grown by adding nodes across availability zones and regions, then this is the database you should choose. Incentivized Verified User Anonymous Read full review	Apache Capacity of computing data in cluster and fast speed. Steven Li Senior Software Developer (Consultant) Read full review
Usability	Apache It’s great tool but it can be complicated when it comes administration and maintenance. Incentivized Glen Kim Senior Software Engineer Read full review	Apache The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times. Incentivized Verified User Anonymous Read full review
Support Rating	Apache Sometimes instead giving straight answer, we ‘re getting transfered to talk professional service. Incentivized Glen Kim Senior Software Engineer Read full review	Apache 1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications. YM Yogesh Mhasde Technical Manager Read full review
Alternatives Considered	Apache We evaluated MongoDB also, but don't like the single point failure possibility. The HBase coupled us too tightly to the Hadoop world while we prefer more technical flexibility. Also HBase is designed for "cold"/old historical data lake use cases and is not typically used for web and mobile applications due to its performance concern. Cassandra, by contrast, offers the availability and performance necessary for developing highly available applications. Furthermore, the Hadoop technology stack is typically deployed in a single location, while in the big international enterprise context, we demand the feasibility for deployment across countries and continents, hence finally we are favor of Cassandra Incentivized yixiang Shan IT Strategic Technical Advisor Read full review	Apache All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python Incentivized Nitin Pasumarthy Software Engineer Read full review
Return on Investment	Apache I have no experience with this but from the blogs and news what I believe is that in businesses where there is high demand for scalability, Cassandra is a good choice to go for. Since it works on CQL, it is quite familiar with SQL in understanding therefore it does not prevent a new employee to start in learning and having the Cassandra experience at an industrial level. Incentivized Verified User Anonymous Read full review	Apache Faster turn around on feature development, we have seen a noticeable improvement in our agile development since using Spark. Easy adoption, having multiple departments use the same underlying technology even if the use cases are very different allows for more commonality amongst applications which definitely makes the operations team happy. Performance, we have been able to make some applications run over 20x faster since switching to Spark. This has saved us time, headaches, and operating costs. Incentivized Verified User Anonymous Read full review
ScreenShots