Walk into the World of Big Data with Apache Hive
June 02, 2021

Walk into the World of Big Data with Apache Hive

akshay kashyap | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Hive

We are using Apache Hive over an on-premise big data setup built on top of Cloud ERA Servers. Use case behind using Apache Hive [it] is query efficient over distributed system and runs queries faster, with parallel execution. We save our metrics such as user info, purchase history, transaction and preferences in HDFS file system and use Apache Hive to query on top of it and run analytics to display output.
  • Simple query language built on top of Ma reduce paradigm.
  • Provides parallel execution over distributed system.
  • Tabular format and connectors available for all cloud platforms.
  • Complex joins may take time to execute due to shuffling of data.
  • Static queries mostly.
  • Slower than Apache Spark by almost 100 times.
  • Dependent on external memory and storage to execute.
  • Easy to setup and maintain on premise queries.
  • Distributed and parallel processing.
  • Anyone that is familiar with SQL can use Apache Hive and learn it very quickly.
  • Improved the performance than a traditional DBMS.
  • Scalability is much better due to support of HDFS distributed processing.
  • Made queries much more efficient than a traditional database such as oracle.
  • Have to maintain on premise hardware is one dependency.
Apache Hive is a query language developed by Facebook to query over a large distributed dataset. Apache is a query engine that runs on top of HDFS, so it utilizes the resources of HDFS Hadoop setup, while Apache Spark is an in memory compute engine, and that's why [it is] much faster than Apache Hive. While Apache HBase mostly deals with unstructured data, while Apache Hive is suitable using structured data stored on HDFS.

Do you think Apache Hive delivers good value for the price?


Are you happy with Apache Hive's feature set?


Did Apache Hive live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of Apache Hive go as expected?


Would you buy Apache Hive again?


You can use Apache Hive to query over a large data warehouse which updates, append records on either batch or in real time. Apache queries can give you output in the desired format that you can use as any reporting tool such as Tableau, directly using Python.