Cassandra - pretty good if you know what you are doing
It serves as the storage layer in our home grown sensor analytics platforms that utilizes spark for the computation. We use it to store billions of samples of wearable sensor data that is collected in various studies and experiments.
- High Availability - we utilize the data replication features of Cassandra. This enables us to access our data even when several nodes have gone down
- Data Locality - our architecture combines Cassandra storage nodes and computation nodes in the same machine. This enables us to utilize data locality and limit expensive network IO to read data.
- Elasticity - Cassandra is a shared nothing architecture. Nodes can be added very easily and they discover the network topology. As soon as a node has joined the Cassandra ring, the data is redistributed among the existing nodes and streamed to it automatically.
Cons
- Cassandra runs on the JVM and therefor may require a lot of GC tuning for read/write intensive applications.
- Requires manual periodic maintenance - for example it is recommended to run a cleanup on a regular basis.
- There are a lot of knobs and buttons to configure the system. For many cases the default configuration will be sufficient, but if its not - you will need significant ramp up on the inner workings of Cassandra in order to effectively tune it.