September 13, 2017
Score 10 out of 10
Overall Satisfaction with HBase
HBase solves problems of scalability and management of multi-terabyte applications. It makes scaling to +1 nodes very easy, especially through Ambari. It is built with fault tolerance and availability in mind. You can use it on a single node but it shines on multi-node infrastructure. With high data access speed and resiliency, I wouldn't recommend any other NoSQL database for general use.
- HBase data access and retrieval only gets better with larger scale.
- Fault tolerance is built in, if you have unreliable hardware, HBase will make every effort to keep your data online.
- Extremely fast key lookups and write throughput.
- Multi-tenancy is still work in progress
- Usability and beginner friendliness
- It has a bad reputation of being complex
HBase was always known for being developer friendly, so to mitigate that, a project called Apache Phoenix was created to allow for a familiar SQL interface on top of HBase. Along with Apache Hive, HBase is now accessible by mere mortals. For fast retrieval you'd use Phoenix and for analytical workloads you'd use Hive with HBase underneath. Having an open source nature, allows for wonderful contributions from a huge community.
- We were able to scale our application from 5TB running in a relational database to 20TB on top of HBase
- Application availability was always high even with half of the nodes having hardware issues
- HBase can be used with standard mechanical harddrive storage. There's no need for fancy SAN or NAS storage with HBase which is almost always expensive.
Typically, Cassandra is faster on reads and HBase is faster on writes. You use Cassandra when you want to use a website, HBase is just an overall good general use database engine. Cassandra has its own storage engine and HBase uses HDFS and all its benefits. MongoDB is typically also used in web development, it has a great support for JSON but it's been known for poor scalability. It also uses its own storage engine.
There's really not anything else out there that I've seen comparable for my use cases. HBase has never proven me wrong. Some companies align their whole business on HBase and are moving all of their infrastructure from other database engines to HBase. It's also open source and has a very collaborative community.
HBase typically fits well in low latency, tight SLA scenarios. It is not recommended to be used in situations where a relational database would fit better. So in essence, if you're trying to do a lot of analytical workloads or joins, HBase wouldn't fit so well. If primary key access is sufficient, then HBase is a good fit.