What users are saying about

Apache Spark

97 Ratings

Cassandra

Top Rated
61 Ratings

Apache Spark

97 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.6 out of 101

Cassandra

Top Rated
61 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.2 out of 101

Add comparison

Likelihood to Recommend

Apache Spark

When the data is very big, and you cannot afford a lot of computational timing such as in a real-time environment, it is advisable to use Apache Spark. There are alternatives to Apache Spark, but it is the most common and robust tool to work with. It is great at batch processing.
Kartik Chavan profile photo

Cassandra

Apache Cassandra is a NoSQL database and well suited where you need highly available, linearly scalable, tunable consistency and high performance across varying workloads. It has worked well for our use cases, and I shared my experiences to use it effectively at the last Cassandra summit! http://bit.ly/1Ok56TKIt is a NoSQL database, finally you can tune it to be strongly consistent and successfully use it as such. However those are not usual patterns, as you negotiate on latency. It works well if you require that. If your use case needs strongly consistent environments with semantics of a relational database or if the use case needs a data warehouse, or if you need NoSQL with ACID transactions, Apache Cassandra may not be the optimum choice.
Rekha Joshi profile photo

Pros

  • We used to make our batch processing faster. Spark is faster in batch processing than MapReduce with it in memory computing
  • Spark will run along with other tools in the Hadoop ecosystem including Hive and Pig
  • Spark supports both batch and real-time processing
  • Apache Spark has Machine Learning Algorithms support
No photo available
  • Continuous availability: as a fully distributed database (no master nodes), we can update nodes with rolling restarts and accommodate minor outages without impacting our customer services.
  • Linear scalability: for every unit of compute that you add, you get an equivalent unit of capacity. The same application can scale from a single developer's laptop to a web-scale service with billions of rows in a table.
  • Amazing performance: if you design your data model correctly, bearing in mind the queries you need to answer, you can get answers in milliseconds.
  • Time-series data: Cassandra excels at recording, processing, and retrieving time-series data. It's a simple matter to version everything and simply record what happens, rather than going back and editing things. Then, you can compute things from the recorded history.
David Prinzing profile photo

Cons

  • Consumes more memory
  • Difficult to address issues around memory utilization
  • Expensive - In-memory processing is expensive when we look for a cost-efficient processing of big data
No photo available
  • Cassandra is a poor choice for implementing application queues.
  • NoSQL requires thinking differently, and can be challenging for people with strong relational database backgrounds to understand. The CQL language helps with this, but it pays to understand how the engine works under the hood. That said, the benefits outweigh the challenge of the learning curve!
  • Database compactions and anti-entropy repair can be burdensome on a busy cluster. Significant improvements have been made in recent versions, but it remains as an operational challenge.
David Prinzing profile photo

Likelihood to Renew

No score
No answers yet
No answers on this topic
Cassandra8.0
Based on 11 answers
I think this question is only relevant to user using the enterprise version. We are using the open-source (community) version, so renewal is not really an issue. We are happy with Cassandra and it serves its original purpose.
No photo available

Alternatives Considered

We specifically choose Spark over MapReduce to make the cluster processing faster
No photo available
Technology selection should be done based on the need and not based on buzz words in the market (google searching). If your data need flat file approach and more searchable based on index and partition keys, then it's better to go for Cassandra. Cassandra is a better choice compared with HBase because Cassandra has a lot of API's ready and available for map reducing queries (like materialized queries). Cassandra uses ring architecture approach, there is no master-slave approach (like HBase). If data published on of the node the data will get synced with other nodes in the ring architecture when compared to HBase which has a dedicated master node to orchestrate the data into its slaves.
Ravi Reddy profile photo

Return on Investment

  • We were able to make batch job faster by 20 times as compared to MapReduce
  • With the language support like Scala, Java, and Python, easily manageable
No photo available
  • The open source version of Cassandra is only suggested for learning the basic concepts and play with its core features. Unless you really want to invest a lot in your developers and architects knowing every detail of Cassandra, I prefer the DataStax enterprise version. Although the license cost is relatively high, I think they it is worth it. I'm thinking about the support, the monitoring tool OpsCenter, and the integration of Solr and Spark (for data analysis).
  • Cassandra didn't fully replace our old and traditional relation database Oracle. In addition, it opens another door for us to deal with some special business use cases that NoSQL database can do better in a more feasible and efficient way.
yixiang Shan profile photo

Pricing Details

Apache Spark

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Cassandra

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details