Apache HBase: Through the Looking Glass!
Overall Satisfaction with HBase
- Apache HBase is a widely used java based distributed NoSQL environment on Apache Hadoop.
- While there has been growing interest and efforts in in memory computing, there are investments on Apache Hadoop (or hadoop provider variants) across domains. So that is a large market.
- I worked on HBase for applications which needed to provide strong consistency and interact with Apache Hadoop.
- You could encounter issues like region is not online or NotServingException or region server going down, out of memory errors.
- As HBase works with Zookeeper, care needs to be taken it is correctly set up. Most issues pertain usually to environment setup, configuration, shared load on system or maintenance.
- The performance across workloads when evaluated against other NoSQL variants was not best in class, this is most times okay, but can be improved.
- If you use Apache HBase, and want to upgrade it for some features then you might need to do a compatibility check against your Apache Hadoop and Apache HBase versions, there are dependency to think about.
- The HBase master slave becomes the single point of failure, and may not be a preferred design.It is not highly available system.
- Last I checked it did not have well tested easy integrations with Spark, and that can help.
Both scalability and response times are reasonable in HBase once tuned correctly. I hear latest version of HBase 2.0 is even better! Needs to be evaluated.
However it entirely depends on projects and what are we trying to solve when making this decision.
- What is the application's inherent need? Does this component fit well in the design?
- Does it provide high data security?
- How does it assure there is no
- How can I make sure it is a highly available system, and no downtime for customer?
- Does it give me the best linear scalability?
- What kind of tuning parameters does it allow the user to configure?
- How does it stack up against other
variants on features, scalability, ease of use/contribute to and maturity of product?
- What throughput can it attain under different kinds of workloads?