Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.
N/A
ClickHouse
Score 7.5 out of 10
N/A
ClickHouse is an open-source, column-oriented OLAP database system enabling real-time analytical reports using SQL queries. With linear scalability, it handles trillions of rows and petabytes of data. ClickHouse Cloud offers a scalable serverless solution for real-time analytics.
N/A
Pricing
Apache Hive
ClickHouse
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache Hive
ClickHouse
Free Trial
No
Yes
Free/Freemium Version
No
Yes
Premium Consulting/Integration Services
No
Yes
Entry-level Setup Fee
No setup fee
Optional
Additional Details
—
Pay for what is used:
It automatically scales up and down compute resources based on the user's workload
It scales storage and compute separately
It automatically scales unused resources down to zero so that users don’t pay for idle services
Software work execution is on a large scale, it is good to use for new projects or organizational changes, data lineage mapping has always been dubious but this one has had good results. You can store and synchronize data from different departments, the storage process can be manual but it is best automated.
The most important thing when using ClickHouse is to be clear that the scenarios in which you want to use it really are the right ones. Many users think that when a database is very fast for a specific use case, it can be extrapolated to other contexts (most of the time different) in which a previous analysis has not been carried out.
ClickHouse is an analytical database, as such, it should be used for such purposes, where the information is stored correctly, the data volumes are really large and the queries to be performed are not the typical traditional queries on several columns with multiple aggregations. ClickHouse is not the solution for this.
On the other hand, if your case is not one of the above, it is quite possible that ClickHouse can help you. Where ClickHouse shines is when you are looking for aggregation over a particular column in large volumes of data.
Apache Hive allows use to write expressive solutions to complex problems thanks to its SQL-like syntax.
Relatively easy to set up and start using.
Very little ramp-up to start using the actual product, documentation is very thorough, there is an active community, and the code base is constantly being improved.
Their MergeTree table engine provide impressive performance for data insert in bulk
Not only data insert but also the way MergeTree engine uses Primary Keys to sort the data and perform data skipping based on the granules its also their secret for ridiculous fast queries
Data compression its also great
They provide especial table engines that allow you to read data directly from other sources like S3
Since its written with C++ you have very granular data types and especial ones like enum, LowCardinality and etc, they save you a lot of storage since are stored as integer values
ClickHouse functions besides the ones that respect ANSI Standards are also awesome and useful
Hive is a very good big data analysis and ad-hoc query platform, which supports scaling also. The BI processes can be easily integrated with Hadoop via the Hive. It can deal with a much larger data set that traditional RDBMS can not. It is a "must-have" component of the big data domain.
Apache Hive is a FOSS project and its open source. We need not definitely comment on anything about the support of open source and its developer community. But, it has got tremendous developer support, awesome documentation. I would justify the fact that much support can be gathered from the community backup.
Besides Hive, I have used Google BigQuery, which is costly but have very high computation speed. Amazon Redshift is the another product, I used in my recent organisation. Both Redshift and BigQuery are managed solution whereas Hive needs to be managed
ClickHouse outperforms, especially in costs, since its compression/indexing engines are so smart, and even with very low computing power, you can already perform huge analyses of the data.