Apache Hive Review
September 22, 2020

Apache Hive Review

Partha Protim Pegu | TrustRadius Reviewer
Score 6 out of 10
Vetted Review
Verified User
Review Source

Overall Satisfaction with Apache Hive

We are currently using Apache Hive as a data warehousing tool to run SQL queries on multiple databases and file systems. Since we are also using Hadoop, we later integrate those queries with Hadoop for data processing. It is a great tool that can be used instead of MySQL, and we are happily using it on multiple projects.
  • Its ability to integrate with Hadoop
  • Multiple users can query data simultaneously
  • Conversion of varieties of data formats within Hive
  • ETL can be done easily
  • It is used only for OLAP and not used for OLTP
  • Sub queries not supported
  • Lower Storage costs in IT infrastructure
  • Lower Data warehousing cost
  • Lower churn rate
  • Better customer acquisition
There seems to be a tug of war within the Hadoop framework. But I think Apache Spark is better than Hive in terms of speed. Spark provides a modern alternative to MapReduce.
As EMR uses Apache Tez, it is also faster than Apache Hive's MapReduce. But I would go for Hive when it comes to batch processing.
Hive also has a community platform of its own just like other Hadoop frameworks. Most of the queries/problems are resolved in the community itself. We can just post our problems or get in touch with a specific user and get the issue resolved. Otherwise there is always the product support team for any resolution.
Hive is a great tool for running queries on a large number of tabulated data. It is easy to use. Our team uses it mostly for running complex Join queries. But again it's the speed that is the differentiating factor. It turns out that MapReduce is not that fast when compared to other alternatives.

Do you think Apache Hive delivers good value for the price?

Yes

Are you happy with Apache Hive's feature set?

Yes

Did Apache Hive live up to sales and marketing promises?

No

Did implementation of Apache Hive go as expected?

Yes

Would you buy Apache Hive again?

No

If you have a lot of tabular data, then Apache Hive is good option for storing and running queries on them. These can later be used for batch processing. Hive can also be used for archiving old data in RDMS tables. This is much cheaper. It's also well suited when you have expensive queries like big Join queries. I would not suggest Hive if you have a lot of sub queries to run.