Item: Apache Cassandra
Rating: 8
Author: Verified User

Overall Satisfaction with Cassandra

Use Cases and Deployment Scope

Cassandra is being used as a time series store for sensor data and is used by several researchers within our department.
It serves as the storage layer in our home grown sensor analytics platforms that utilizes spark for the computation. We use it to store billions of samples of wearable sensor data that is collected in various studies and experiments.

Pros and Cons

Pros

High Availability - we utilize the data replication features of Cassandra. This enables us to access our data even when several nodes have gone down
Data Locality - our architecture combines Cassandra storage nodes and computation nodes in the same machine. This enables us to utilize data locality and limit expensive network IO to read data.
Elasticity - Cassandra is a shared nothing architecture. Nodes can be added very easily and they discover the network topology. As soon as a node has joined the Cassandra ring, the data is redistributed among the existing nodes and streamed to it automatically.

Cons

Cassandra runs on the JVM and therefor may require a lot of GC tuning for read/write intensive applications.
Requires manual periodic maintenance - for example it is recommended to run a cleanup on a regular basis.
There are a lot of knobs and buttons to configure the system. For many cases the default configuration will be sufficient, but if its not - you will need significant ramp up on the inner workings of Cassandra in order to effectively tune it.

Return on Investment

This question is not relevant, as I work in a non-profit educational institution.

Alternatives Considered

We also evaluated mySQL and mongoDB. Both of them have their strengths and weaknesses but they are less suited for storing massive amounts of time series data. In addition, they are not elastic by nature and we required a "future-proof" solution as it was difficult to estimate how much data we would need to store.

Likelihood to Renew

I think this question is only relevant to user using the enterprise version. We are using the open-source (community) version, so renewal is not really an issue. We are happy with Cassandra and it serves its original purpose.

Likelihood to Recommend

Cassandra has excellent high availability and partition tolerance and has a robust architecture.
It is well suited for storing immutable data as deletes are extremely inefficient. As such, it is well suited for data archive and deep storage.
It is less appropriate for OLAP as has limited aggregation and filtering abilities, and no grouping whatsoever.

Comments

Please log in to join the conversation

Cassandra - pretty good if you know what you are doing

Software Version