The Apache HBase project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable.
N/A
SingleStore
Score 8.4 out of 10
N/A
SingleStore aims to enable organizations to scale from one to one million customers, handling SQL, JSON, full text and vector workloads in one unified platform.
Hbase is well suited for large organizations with millions of operations performing on tables, real-time lookup of records in a table, range queries, random reads and writes and online analytics operations. Hbase cannot be replaced for traditional databases as it cannot support all the features, CPU and memory intensive. Observed increased latency when using with MapReduce job joins.
Good for Applications needing instant insights on large, streaming datasets. Applications processing continuous data streams with low latency. When a multi-cloud, high-availability database is required When NOT to Use Small-scale applications with limited budgets Projects that do not require real-time analytics or distributed scaling Teams without experience in distributed databases and HTAP architectures.
Stored procedures functionality is not available so it should be implemented.
HBase is CPU and Memory intensive with large sequential input or output access while as Map Reduce jobs are primarily input or output bound with fixed memory. HBase integrated with Map-reduce jobs will result in random latencies.
It does not release a patch to have back porting; it just releases a new version and stops support; it's difficult to keep up to that pace.
Support engineers lack expertise, but they seem to be improving organically.
Lacks enterprise CDC capability: Change data capture (CDC) is a process that tracks and records changes made to data in a database and then delivers those changes to other systems in real time.
For enterprise-level backup & restore capability, we had to implement our model via Velero snapshot backup.
There's really not anything else out there that I've seen comparable for my use cases. HBase has never proven me wrong. Some companies align their whole business on HBase and are moving all of their infrastructure from other database engines to HBase. It's also open source and has a very collaborative community.
[Until it is] supported on AWS ECS containers, I will reserve a higher rating for SingleStore. Right now it works well on EC2 and serves our current purpose, [but] would look forward to seeing SingleStore respond to our urge of feature in a shorter time period with high quality and security.
SingleStore excels in real-time analytics and low-latency transactions, making it ideal for operational analytics and mixed workloads. Snowflake shines in batch analytics and data warehousing with strong scalability for large datasets. SingleStore offers faster data ingestion and query execution for real-time use cases, while Snowflake is better for complex analytical queries on historical data.
The support deep dives into our most complexed queries and bizarre issues that sometimes only we get comparing to other clients. Our special workload (thousands of Kafka pipelines + high concurrency of queries). The response match to the priority of the request, P1 gets immediate return call. Missing features are treated, they become a client request and being added to the roadmap after internal consideration on all client needs and priority. Bugs are patched quite fast, depends on the impact and feasible temporary workarounds. There is no issue that we haven't got a proper answer, resolution or reasoning
We allowed 2-3 months for a thorough evaluation. We saw pretty quickly that we were likely to pick SingleStore, so we ported some of our stored procedures to SingleStore in order to take a deeper look. Two SingleStore people worked closely with us to ensure that we did not have any blocking problems. It all went remarkably smoothly.
Cassandra os great for writes. But with large datasets, depending, not as great as HBASE. Cassandra does support parquet now. HBase still performance issues. Cassandra has use cases of being used as time series. HBase, it fails miserably. GeoSpatial data, Hbase does work to an extent. HA between the two are almost the same.
Greenplum is good in handling very large amount of data. Concurrency in Greenplum was a major problem. Features available in SingleStore like Pipelines and in memory features are not available in Greenplum. Gemfire was not scaling well like SingleStore. Support of both Greenplum and Gemfire was not good. Product team did not help us much like the ones in SingleStore who helped us getting started on our first cluster very fast.
As Hbase is a noSql database, here we don't have transaction support and we cannot do many operations on the data.
Not having the feature of primary or a composite primary key is an issue as the architecture to be defined cannot be the same legacy type. Also the transaction concept is not applicable here.
The way data is printed on console is not so user-friendly. So we had to use some abstraction over HBase (eg apache phoenix) which means there is one new component to handle.
As the overall performance and functionality were expanded, we are able to deliver our data much faster than before, which increases the demand for data.
Metadata is available in the platform by default, like metadata on the pipelines. Also, the information schema has lots of metadata, making it easy to load our assets to the data catalog.