CouchDB for analytics
March 31, 2017
CouchDB for analytics
Score 9 out of 10
Overall Satisfaction with CouchDB
It's being used as a document store for a social media analytics system, dumping thousands of updates an hour into a map/reduce system that generates reports and feeds into other task-specific databases. It's a more flexible alternative to relational databases like MySQL, and is easier to scale due to master-master replication.
- It can replicate and sync with web browsers via PouchDB. This lets you keep a synced copy of your database on the client-side, which offers much faster data access than continuous HTTP requests would allow, and enables offline usage.
- Simple Map/Reduce support. The M/R system lets you process terabytes of documents in parallel, save the results, and only need to reprocess documents that have changed on subsequent updates. While not as powerful as Hadoop, it is an easy to use query system that's hard to screw up.
- Sharding and Clustering support. As of CouchDB 2.0, it supports clustering and sharding of documents between instances without needing a load balancer to determine where requests should go.
- Master to Master replication lets you clone, continuously backup, and listen for changes through the replication protocol, even over unreliable WAN links.
- The HTTP content type headers aren't explicitly set to `application/json` when you make a request with your browser. They incorrectly respond with `text/plain`. This issue has been reported multiple times, and even had patches proposed, but so far they've been rejected.
- CouchDB doesn't support returning gzipped responses. You can get around this by using nginx in front of your CouchDB servers, but it could be faster if it was supported directly.
- Even in clustered mode, CouchDB nodes aren't able to share computed view data through replication. Each node needs to compute it on their own, which is a little wasteful.
- Faster development. Since it's a schemaless system, it's easy to add new fields and change the data model, so long as views stay the same.
- Lower cost of ownership. Unlike paid systems, CouchDB is totally free and supported by the Apache Foundation. The only thing you need to pay for is the hardware it runs on.
- Easier integration with other services. CouchDB uses a HTTP API for everything, and since nearly all languages have well maintained HTTP libraries, it's easy to connect them to the database.
MongoDB and CouchDB are both document stores, but their concurrency models and ability to scale are very different. MongoDB cannot replicate / shard over unreliable links and network partitions have been the cause of data loss in the past. MongoDB has an easier query language that is only beginning to be emulated by the Mango query and index system in CouchDB 2.0. Also, MongoDB tends to have better documentation and its communication protocol is a bit more efficient than HTTP.
CouchDB has no renewals because it's free, and I have no reason to stop using it.