TrustRadius
Riak Review 5 of 5
Riak Review: "Great product and fast, beware of deletes!"
https://www.trustradius.com/nosql-databasesRiakUnspecified7.910101
Gerald Chenvert profile photo
December 01, 2015

Riak Review: "Great product and fast, beware of deletes!"

Score 7 out of 101
Vetted Review
Verified User
Review Source

Overall Satisfaction with Riak

We use Riak as a datacenter/cloud replicating noSQL database. It is being used by various teams in our organization to write data which is then replicated to the cloud for cloud based service lookups. Riak allows us to provide data to the cloud in a secure manner via the "hub and spoke" replication model. Riak has proved instrumental in allowing us to move applications from a datacenter, to the cloud.
  • Riak is great at handling large volumes of requests. We've seen Riak perform well under large volume while keeping response times quite low.
  • Riak is also fast providing consistent sub 10ms reads in both the datacenter and cloud.
  • Flexible allowing storage of numerous data types. We heavily leverage this to store various JSON documents in a single bucket.
  • We really like the RESTful interface that is provided. Makes the learning curve almost invisible and provides a quick speed to market in using Riak.
  • Deletes!!! We've seen on numerous occasions where Riak has "resurrected" deleted data. We've worked with Basho numerous times and tried multiple changes to the way we interact with Riak to prevent the problem but it still remains. The deletes seem to reappear weeks, even months, after the delete was issued. We've had to work around this issue by providing a "deleted" flag for all data objects stored in Riak. Thus, we do no delete but simply flip the flag. Excess baggage we would really like to not have to worry about.
  • Search. Currently there's no way to tell what data you have in Riak without already knowing a particular bucket/key. There is a way to list the keys for a given bucket but due to performance implications, this is not a viable method to lookup data. Especially when you have a large amount of keys in the bucket.
I would once again mention the deletes. The idea of big data is that you can store lots and lots of data. Riak excels at this by still providing fast reads regardless of the amount of data. However, Riak doesn't seem to support "clean data". For use cases that require clean data, Riak makes it extremely hard to ensure your data is clean due to the possibility of resurrecting deleted data seemingly without notice. As a result, many work arounds have to be performed and the size of the Riak dataset is ever growing. Obviously while Riak can handle the large amount of data, this will eventually become an issue if you're needing a clean data set rather than just storing and never deleting.
  • It provided a solution for use to securely write data to the cloud. This has been instrumental in allowing us to move more applications to the cloud. Writes are performed behind firewalls and then replicated to the cloud for application consumption. By moving more applications to the cloud, we free up internal resources and can serve information in a much more scalable and reliable way.
Riak is a key/value pair store which is great for certain use cases. For our use case, the ability to search is an extremely useful feature. Apache Cassandra can provide this while Riak cannot. Also again for our use case, the ability to delete is critical as we strive to maintain clean data which means we like to purge old or obsolete data. Riak, while providing the ability to do so, is not reliable as we've seen data resurrect on numerous occasions. Apache Cassandra allows for deletes and in our proof of concept testing where we've explicitly tested this feature, it permanently deleted the data. One other key feature for us looking at Apache Cassandra is the ability to update multiple pieces of data simultaneously for a given row (Cassandra) or key (Riak). Riak only allows for updating at the key level by replacing the data that was there. Thus, if you have multiple threads updating the same data in Riak, contention issues arise and the possibility of overwriting data is a real concern. Apache Cassandra helps this use case by storing the data in columns rather than one big value. Thus, updating various columns for a given key removes contention issues in Cassandra.
Riak works great for our use case but the fact that deletes seem to resurrect is a real issue for us. Unless we can get this solved, we'll continue to look at other products to see if our use case fits. Otherwise Riak is a great product and it fits our use case 95%. We have found work arounds to the remaining 5%.

Riak is well suited as a key value store. It does exactly what it says it does. If you have well known buckets/keys, Riak is a great solution for a ton of different use cases. The lack of ability to search is somewhat problematic for other use cases requiring this ability.

Also, while the ability to store a variable array of data into a single bucket/key is extremely useful, if you have a use case requiring parts of that value to be updated independently, Riak does not support transactions so you open yourself up to contention issues if the data is being updated regularly in small portions. One solution to this is to use multiple bucket/key parts to store the data, which will remove the contention issue, but then you have to increase your Riak footprint which results in more buckets and can sometimes make things more difficult than needed. This has been a nagging issue we've had to deal with on multiple occasions.