Overall Satisfaction with Riak
I used Riak's ability to link objects in the database as a way to build up a hierarchical tree of documents representing student administrative and testing data as part of a project to integrate across dozens of systems in a large technology and education company. Riak was used as the basis for the data model used in the operational data store.
- If you're considering a NoSQL solution, one of Riak's strengths is that it is built to scale with very low management overhead. Compare Riak to something like HBase (requires a full Hadoop cluster, along with YARN, and Zookeeper), and you'll find it's much easier to set up and maintain.
- Schemaless design in Riak makes it really easy to apply whatever design you like. Since you're not locked into seeing things just the SQL way, you've got more freedom with the type of data you store and the way you store it.
- Riak is highly reliable. It's built on a platform that's meant to be incredibly resistant to failure. If you run in Riak in a cluster or cloud based environment, you can trust that it will be very dependable.
- At the time I was using Riak, data was stored as blobs so you couldn't query the data directly on the server. This has since been remedied with the addition of full-text search support.
- Riak's simple API and simple management model made it a no brainer when it came to adopting it as a technology for the team.
At the time I worked on the project those were the three competing technologies I evaluated. Couchbase didn't have memcache integrated at the time. Riak was by far the easiest to set up, and it's linking capability struck the right balance of having just enough relational capability for our needs.
Right now, I'm on a project where we need databases that can run on embedded systems. Riak isn't necessarily the best fit for that scenario. But when we need a clustered database, that's where we'd start considering Riak.
When I'm considering doing analysis on a large data set using machine learning algorithms, admittedly, Riak is not my first choice. I'd probably look into the idea of using something like Spark to run a distributed algorithm on my data. That means I'd have to copy data out of Riak into HDFS to run it. If you had good integration with Spark, that would be a welcome addition. That would save a lot of time in moving data between the Riak cluster and the HDFS cluster.