Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.
N/A
Apache Kafka
Score 8.5 out of 10
N/A
Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The Kafka event streaming platform is used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
N/A
Pricing
Apache Hive
Apache Kafka
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache Hive
Apache Kafka
Free Trial
No
No
Free/Freemium Version
No
No
Premium Consulting/Integration Services
No
No
Entry-level Setup Fee
No setup fee
No setup fee
Additional Details
—
—
More Pricing Information
Community Pulse
Apache Hive
Apache Kafka
Considered Both Products
Apache Hive
Verified User
Engineer
Chose Apache Hive
Apache Spark is similar in the sense that it too can be used to query and process large amounts of data through its Dataframe interface. Hive is better for short-term querying while Spark is better for persistent and long-term analysis. Another product is Impala. For our …
Besides Hive, I have used Google BigQuery, which is costly but have very high computation speed. Amazon Redshift is the another product, I used in my recent organisation. Both Redshift and BigQuery are managed solution whereas Hive needs to be managed
Software work execution is on a large scale, it is good to use for new projects or organizational changes, data lineage mapping has always been dubious but this one has had good results. You can store and synchronize data from different departments, the storage process can be manual but it is best automated.
Apache Kafka is well-suited for most data-streaming use cases. Amazon Kinesis and Azure EventHubs, unless you have a specific use case where using those cloud PaAS for your data lakes, once set up well, Apache Kafka will take care of everything else in the background. Azure EventHubs, is good for cross-cloud use cases, and Amazon Kinesis - I have no real-world experience. But I believe it is the same.
Apache Hive allows use to write expressive solutions to complex problems thanks to its SQL-like syntax.
Relatively easy to set up and start using.
Very little ramp-up to start using the actual product, documentation is very thorough, there is an active community, and the code base is constantly being improved.
Really easy to configure. I've used other message brokers such as RabbitMQ and compared to them, Kafka's configurations are very easy to understand and tweak.
Very scalable: easily configured to run on multiple nodes allowing for ease of parallelism (assuming your queues/topics don't have to be consumed in the exact same order the messages were delivered)
Not exactly a feature, but I trust Kafka will be around for at least another decade because active development has continued to be strong and there's a lot of financial backing from Confluent and LinkedIn, and probably many other companies who are using it (which, anecdotally, is many).
Sometimes it becomes difficult to monitor our Kafka deployments. We've been able to overcome it largely using AWS MSK, a managed service for Apache Kafka, but a separate monitoring dashboard would have been great.
Simplify the process for local deployment of Kafka and provide a user interface to get visibility into the different topics and the messages being processed.
Learning curve around creation of broker and topics could be simplified
Hive is a very good big data analysis and ad-hoc query platform, which supports scaling also. The BI processes can be easily integrated with Hadoop via the Hive. It can deal with a much larger data set that traditional RDBMS can not. It is a "must-have" component of the big data domain.
Apache Kafka is highly recommended to develop loosely coupled, real-time processing applications. Also, Apache Kafka provides property based configuration. Producer, Consumer and broker contain their own separate property file
Apache Hive is a FOSS project and its open source. We need not definitely comment on anything about the support of open source and its developer community. But, it has got tremendous developer support, awesome documentation. I would justify the fact that much support can be gathered from the community backup.
Support for Apache Kafka (if willing to pay) is available from Confluent that includes the same time that created Kafka at Linkedin so they know this software in and out. Moreover, Apache Kafka is well known and best practices documents and deployment scenarios are easily available for download. For example, from eBay, Linkedin, Uber, and NYTimes.
Besides Hive, I have used Google BigQuery, which is costly but have very high computation speed. Amazon Redshift is the another product, I used in my recent organisation. Both Redshift and BigQuery are managed solution whereas Hive needs to be managed
I used other messaging/queue solutions that are a lot more basic than Confluent Kafka, as well as another solution that is no longer in the market called Xively, which was bought and "buried" by Google. In comparison, these solutions offer way fewer functionalities and respond to other needs.
Positive: Get a quick and reliable pub/sub model implemented - data across components flows easily.
Positive: it's scalable so we can develop small and scale for real-world scenarios
Negative: it's easy to get into a confusing situation if you are not experienced yet or something strange has happened (rare, but it does). Troubleshooting such situations can take time and effort.