TrustRadius: an HG Insights company

Apache HBase

Score7.3 out of 10

32 Reviews and Ratings

What is Apache HBase?

The Apache HBase project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable.

Top Performing Features

  • Scalability

    NoSQL databases are inherently more stable than relational databases and have built-in support for replication and partitioning of data to support scalability.

    Category average: 9.4

  • Deployment model flexibility

    Can be deployed on-premise or in the cloud.

    Category average: 8.9

  • Availability

    Availability is the probability that the NoSQL database will be available to preform its function when called upon.

    Category average: 8.9

Areas for Improvement

  • Performance

    How fast the database performs under data load

    Category average: 9.2

  • Data model flexibility

    NoSQL databases do not rely on rely on tables, columns, rows, or schemas to organize and retrieve data, but use use more flexible data models to accommodate the large volume and variety of data being generated by modern applications.

    Category average: 9

  • Concurrency

    Concurrency is the ability for multiple processes to access or change shared data simultaneously. The greater the number of concurrent user processes that can execute without blocking each other, the greater the concurrency of the database system.

    Category average: 9

HBase as the brother of big data

Pros

  • HBase stores the big data in a great manner and it is horizontally scalable.
  • Another major reason is security, we can secure the HBase database using Atlas, Ranger.
  • Store any format of data like structured, semi-structured and unstructured.
  • Consistency
  • Strongly consistent reads and writes are provided by HBase, we use it for high-speed requirements if we do not need RDBMS-supported features such as full transaction support or typed columns.

Cons

  • There are very few commands in HBase.
  • Stored procedures functionality is not available so it should be implemented.
  • HBase is CPU and Memory intensive with large sequential input or output access while as Map Reduce jobs are primarily input or output bound with fixed memory. HBase integrated with Map-reduce jobs will result in random latencies.

Return on Investment

  • Positive: Open source, easy to use, good to store big data.
  • Negative: SQL functionalities are not available.
  • More memory utilization
  • More troubleshooting

Alternatives Considered

Imonggo, CouchDB and Cassandra

Other Software Used

Apache Hive, Apache Spark, Hadoop, Microsoft SQL Server, Splice Machine

An Amazing Experience

Pros

  • Scalable and truly non-relational data
  • HBase operations run in real-time on its database rather than MapReduce jobs
  • Scales linearly to support billions of rows with millions of columns

Cons

  • Difficult for people who are building custom tools for SQL like purposes to understand HBase
  • Cannot be used for transactional datasets

Return on Investment

  • As Hbase is a noSql database, here we don't have transaction support and we cannot do many operations on the data.
  • Not having the feature of primary or a composite primary key is an issue as the architecture to be defined cannot be the same legacy type. Also the transaction concept is not applicable here.
  • The way data is printed on console is not so user-friendly. So we had to use some abstraction over HBase (eg apache phoenix) which means there is one new component to handle.

Alternatives Considered

Cassandra and MongoDB

Other Software Used

Apache Solr, Apache Spark, Hadoop

HBASE!!!

Pros

  • Excellent for read performance
  • Great store of file format of avro
  • Easy integration into mapreduce
  • Replication ability

Cons

  • Write performance
  • Performance support for parquet file format. supports, but performance wise still not there
  • API / library availability for spark, rather than creating a new library for it

Return on Investment

  • Negative ROI has been on hardware usage. When used frequently, we have had constant disk failures. As a result, it requires HDD replacements.
  • But with disk failures, HA is available, however, to a certain extent.
  • Large datasets helped causality issues to be mitigated.

Alternatives Considered

Cassandra

Other Software Used

Cassandra, Apache Solr, Elasticsearch, Apache Spark, PostgreSQL, MariaDB, Amazon DynamoDB, Azure Cosmos DB

No SQL Database to Support Near Real Time Analytics

Pros

  • Faster lookup of records using the row keys. It helped to fetch thousands of records in a much faster way using the row keys
  • As it is a columnar data store, helped us to improve the query performance and aggregations
  • Sharding helps us to optimize the data storage and retrieval. HBase provides automatic or manually sharding of tables.
  • Dynamic addition of columns and column family helped us to modify the schema with ease.

Cons

  • Identified issues with Hmaster when handling a huge number of nodes
  • Cannot have multiple indexes as row key is the only column which could be indexed.
  • HBase does not support partial row keys which limit its query performance.

Return on Investment

  • It supports the near real-time use cases when integrated with Spark Streaming.
  • It helps to store huge volume of records with consistent reads/writes.
  • Maintenance is the pain point as it requires some maintenance and monitoring of regional servers and nodes

Alternatives Considered

MongoDB, MySQL and Teradata Database

Other Software Used

Apache Hive, Apache Spark, Looker

HBase, The Only Enterprise NoSQL Choice

Pros

  • Scalability. HBase can scale to trillions of records.
  • Fast. HBase is extremely fast to scan values or retrieve individual records by key.
  • HBase can be accessed by standard SQL via Apache Phoenix.
  • Integrated. I can easily store and retrieve data from HBase using Apache Spark.
  • It is easy to set up DR and backups.
  • Ingest. It is easy to ingest data into HBase via shell, Java, Apache NiFi, Storm, Spark, Flink, Python and other means.

Cons

  • Not for small data
  • Requires a cluster

Return on Investment

  • It is affordable, so it saves money
  • It scales, so it allows for storage of everything, saving valuable data
  • It removes the need for expensive proprietary data stores
  • It saves money by allowing for offload from expensive RDBMS and paid storage

Alternatives Considered

MongoDB

Other Software Used

Apache Hive, Apache Spark, TensorFlow