Overall Satisfaction with Cassandra
We wanted to use Cassandra to load millions of metrics we collect daily from our user base. After we collected the data we also needed to perform calculations and run "sql" like queries. The only database that came to mind, and does all those things well, is Cassandra.
- Automatic data sharding between nodes
- High availability
- Python Support drivers
- Managing cassandra nodes (adding, removing)
- Need a separate tool to have a console (datastax opscenter)
- We were able to consolidate costs onto a 3 node cassandra cluster from Redshift.
- Couchbase Server
Cassandra does one thing very well. It's able to collect any type of metrics and analytics and store them at very fast speeds. But when it comes to reading the data, there are minor performance issues. That's when other databases such as couchdb or couchbase come in. They can do just the opposite very well. In couchbase it can read data very fast but writing data is a bit costly.
I would give Cassandra a higher rating only if managing a cluster becomes easier. Currently we need a team of at least 2-3 people to manage a 10+ node cluster. The cluster needs a lot of maintenance, cleaning up, monitoring and basically demands much attention. We would basically need a dedicated resource just for managing a Cassandra cluster. If more of these simple tasks get automated we can slim down a team to manage a cluster.
Cassandra performed very well when we were writing a ~300 GB of data per day on a 3 node cluster. If we had decided to read instead we found minor performance issues. When reading the data we expected as much. But for applications that are very read heavy we would chose a different product such as Couchbase.