December 13, 2018
Score 8 out of 10
Overall Satisfaction with HBase
HBase is used as part of the company's main revenue generating platform. We're using it store data with usages of mapreduce, generates locational information for advertising business and location analytics. Storage wise, it made sense to use HBASE over Cassandra, as well as for read performance with avro data with geospatial information in the data
- Excellent for read performance
- Great store of file format of avro
- Easy integration into mapreduce
- Replication ability
- Write performance
- Performance support for parquet file format. supports, but performance wise still not there
- API / library availability for spark, rather than creating a new library for it
- Negative ROI has been on hardware usage. When used frequently, we have had constant disk failures. As a result, it requires HDD replacements.
- But with disk failures, HA is available, however, to a certain extent.
- Large datasets helped causality issues to be mitigated.
Cassandra os great for writes. But with large datasets, depending, not as great as HBASE. Cassandra does support parquet now. HBase still performance issues. Cassandra has use cases of being used as time series. HBase, it fails miserably. GeoSpatial data, Hbase does work to an extent. HA between the two are almost the same.
Hbase is open source. So will be using it in any case. If it was made into commercial product, strong possibility of not using HBase, and would probably use something else at that point, most likely Cassandra. HBase does scale, if done correctly, and will perform if used correctly. Would reocmmend to use.
It does depend on the use case scenario. It works really well if your schema doesn't really need relational features. It's really good for that. If you want to run as transactional, not a good idea. Relational analytics is not good for this, as well as edge network data. If you're using PB of data, then HBASE is best suited in this case as well.