Overall Satisfaction with Riak
We use Riak as a datacenter/cloud replicating noSQL database. It is being used by various teams in our organization to write data which is then replicated to the cloud for cloud based service lookups. Riak allows us to provide data to the cloud in a secure manner via the "hub and spoke" replication model. Riak has proved instrumental in allowing us to move applications from a datacenter, to the cloud.
- Riak is great at handling large volumes of requests. We've seen Riak perform well under large volume while keeping response times quite low.
- Riak is also fast providing consistent sub 10ms reads in both the datacenter and cloud.
- Flexible allowing storage of numerous data types. We heavily leverage this to store various JSON documents in a single bucket.
- We really like the RESTful interface that is provided. Makes the learning curve almost invisible and provides a quick speed to market in using Riak.
- Deletes!!! We've seen on numerous occasions where Riak has "resurrected" deleted data. We've worked with Basho numerous times and tried multiple changes to the way we interact with Riak to prevent the problem but it still remains. The deletes seem to reappear weeks, even months, after the delete was issued. We've had to work around this issue by providing a "deleted" flag for all data objects stored in Riak. Thus, we do no delete but simply flip the flag. Excess baggage we would really like to not have to worry about.
- Search. Currently there's no way to tell what data you have in Riak without already knowing a particular bucket/key. There is a way to list the keys for a given bucket but due to performance implications, this is not a viable method to lookup data. Especially when you have a large amount of keys in the bucket.
- It provided a solution for use to securely write data to the cloud. This has been instrumental in allowing us to move more applications to the cloud. Writes are performed behind firewalls and then replicated to the cloud for application consumption. By moving more applications to the cloud, we free up internal resources and can serve information in a much more scalable and reliable way.
Riak is a key/value pair store which is great for certain use cases. For our use case, the ability to search is an extremely useful feature. Apache Cassandra can provide this while Riak cannot. Also again for our use case, the ability to delete is critical as we strive to maintain clean data which means we like to purge old or obsolete data. Riak, while providing the ability to do so, is not reliable as we've seen data resurrect on numerous occasions. Apache Cassandra allows for deletes and in our proof of concept testing where we've explicitly tested this feature, it permanently deleted the data. One other key feature for us looking at Apache Cassandra is the ability to update multiple pieces of data simultaneously for a given row (Cassandra) or key (Riak). Riak only allows for updating at the key level by replacing the data that was there. Thus, if you have multiple threads updating the same data in Riak, contention issues arise and the possibility of overwriting data is a real concern. Apache Cassandra helps this use case by storing the data in columns rather than one big value. Thus, updating various columns for a given key removes contention issues in Cassandra.