What users are saying about

Amazon EMR

24 Ratings

Cassandra

Top Rated
61 Ratings

Amazon EMR

24 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.3 out of 101

Cassandra

Top Rated
61 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.2 out of 101

Add comparison

Likelihood to Recommend

Amazon EMR

If you don't have big data ..i.e petabytes of data with terabytes of data generating every day, then don't use Hadoop. Relational databases are enough for terabytes of data. Hadoop is not well suited for transactional systems or data.
No photo available

Cassandra

Apache Cassandra is a NoSQL database and well suited where you need highly available, linearly scalable, tunable consistency and high performance across varying workloads. It has worked well for our use cases, and I shared my experiences to use it effectively at the last Cassandra summit! http://bit.ly/1Ok56TKIt is a NoSQL database, finally you can tune it to be strongly consistent and successfully use it as such. However those are not usual patterns, as you negotiate on latency. It works well if you require that. If your use case needs strongly consistent environments with semantics of a relational database or if the use case needs a data warehouse, or if you need NoSQL with ACID transactions, Apache Cassandra may not be the optimum choice.
Rekha Joshi profile photo

Pros

  • Distributed computing
  • Fault tolerant
  • Uptime
No photo available
  • Continuous availability: as a fully distributed database (no master nodes), we can update nodes with rolling restarts and accommodate minor outages without impacting our customer services.
  • Linear scalability: for every unit of compute that you add, you get an equivalent unit of capacity. The same application can scale from a single developer's laptop to a web-scale service with billions of rows in a table.
  • Amazing performance: if you design your data model correctly, bearing in mind the queries you need to answer, you can get answers in milliseconds.
  • Time-series data: Cassandra excels at recording, processing, and retrieving time-series data. It's a simple matter to version everything and simply record what happens, rather than going back and editing things. Then, you can compute things from the recorded history.
David Prinzing profile photo

Cons

  • Cost overhead is a bit high
  • Limited versions of frameworks that can be used
No photo available
  • Cassandra is a poor choice for implementing application queues.
  • NoSQL requires thinking differently, and can be challenging for people with strong relational database backgrounds to understand. The CQL language helps with this, but it pays to understand how the engine works under the hood. That said, the benefits outweigh the challenge of the learning curve!
  • Database compactions and anti-entropy repair can be burdensome on a busy cluster. Significant improvements have been made in recent versions, but it remains as an operational challenge.
David Prinzing profile photo

Likelihood to Renew

No score
No answers yet
No answers on this topic
Cassandra8.0
Based on 11 answers
I think this question is only relevant to user using the enterprise version. We are using the open-source (community) version, so renewal is not really an issue. We are happy with Cassandra and it serves its original purpose.
No photo available

Alternatives Considered

The alternatives to EMR are mainly hadoop distributions owned by the 3 companies above. I have not used the other distributions so it is difficult to comment, but the general tradeoff is, at the cost of a longer setup time and more infra management, you get more flexible versioning and potentially faster access to newer versions of some frameworks such as Spark.
No photo available
We also evaluated mySQL and mongoDB. Both of them have their strengths and weaknesses but they are less suited for storing massive amounts of time series data. In addition, they are not elastic by nature and we required a "future-proof" solution as it was difficult to estimate how much data we would need to store.
No photo available

Return on Investment

  • It was obviously cheaper and convenient to use as most of our data processing and pipelines are on AWS. It was fast and readily available with a click and that saved a ton of time rather than having to figure out the down time of the cluster if its on premises.
  • It saved time on processing chunks of big data which had to be processed in short period with minimal costs. EMR solved this as the cluster setup time and processing was simple, easy, cheap and fast.
  • It had a negative impact as it was very difficult in submitting the test jobs as it lags a UI to submit spark code snippets.
No photo available
  • The open source version of Cassandra is only suggested for learning the basic concepts and play with its core features. Unless you really want to invest a lot in your developers and architects knowing every detail of Cassandra, I prefer the DataStax enterprise version. Although the license cost is relatively high, I think they it is worth it. I'm thinking about the support, the monitoring tool OpsCenter, and the integration of Solr and Spark (for data analysis).
  • Cassandra didn't fully replace our old and traditional relation database Oracle. In addition, it opens another door for us to deal with some special business use cases that NoSQL database can do better in a more feasible and efficient way.
yixiang Shan profile photo

Pricing Details

Amazon EMR

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Cassandra

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details