Cassandra - pretty good if you know what you are doing
October 13, 2015

Cassandra - pretty good if you know what you are doing

Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source

Software Version

2.1.5

Overall Satisfaction with Cassandra

Cassandra is being used as a time series store for sensor data and is used by several researchers within our department.
It serves as the storage layer in our home grown sensor analytics platforms that utilizes spark for the computation. We use it to store billions of samples of wearable sensor data that is collected in various studies and experiments.
  • High Availability - we utilize the data replication features of Cassandra. This enables us to access our data even when several nodes have gone down
  • Data Locality - our architecture combines Cassandra storage nodes and computation nodes in the same machine. This enables us to utilize data locality and limit expensive network IO to read data.
  • Elasticity - Cassandra is a shared nothing architecture. Nodes can be added very easily and they discover the network topology. As soon as a node has joined the Cassandra ring, the data is redistributed among the existing nodes and streamed to it automatically.
  • Cassandra runs on the JVM and therefor may require a lot of GC tuning for read/write intensive applications.
  • Requires manual periodic maintenance - for example it is recommended to run a cleanup on a regular basis.
  • There are a lot of knobs and buttons to configure the system. For many cases the default configuration will be sufficient, but if its not - you will need significant ramp up on the inner workings of Cassandra in order to effectively tune it.
Compaction may take a significant amount of time, and at times it will not complete. compaction requires resources, so cluster performance will be degraded during that time.
Cassandra CQL does not support many SQL features. It is limited due to the architecture of the system.
  • This question is not relevant, as I work in a non-profit educational institution.
We also evaluated mySQL and mongoDB. Both of them have their strengths and weaknesses but they are less suited for storing massive amounts of time series data. In addition, they are not elastic by nature and we required a "future-proof" solution as it was difficult to estimate how much data we would need to store.
I think this question is only relevant to user using the enterprise version. We are using the open-source (community) version, so renewal is not really an issue. We are happy with Cassandra and it serves its original purpose.
Cassandra has excellent high availability and partition tolerance and has a robust architecture.
It is well suited for storing immutable data as deletes are extremely inefficient. As such, it is well suited for data archive and deep storage.
It is less appropriate for OLAP as has limited aggregation and filtering abilities, and no grouping whatsoever.