Overall Satisfaction with Apache Hive
We are using Apache Hive over an on-premise big data setup built on top of Cloud ERA Servers. Use case behind using Apache Hive [it] is query efficient over distributed system and runs queries faster, with parallel execution. We save our metrics such as user info, purchase history, transaction and preferences in HDFS file system and use Apache Hive to query on top of it and run analytics to display output.
- Simple query language built on top of Ma reduce paradigm.
- Provides parallel execution over distributed system.
- Tabular format and connectors available for all cloud platforms.
- Complex joins may take time to execute due to shuffling of data.
- Static queries mostly.
- Slower than Apache Spark by almost 100 times.
- Dependent on external memory and storage to execute.
- Easy to setup and maintain on premise queries.
- Distributed and parallel processing.
- Anyone that is familiar with SQL can use Apache Hive and learn it very quickly.
- Improved the performance than a traditional DBMS.
- Scalability is much better due to support of HDFS distributed processing.
- Made queries much more efficient than a traditional database such as oracle.
- Have to maintain on premise hardware is one dependency.
Apache Hive is a query language developed by Facebook to query over a large distributed dataset. Apache is a query engine that runs on top of HDFS, so it utilizes the resources of HDFS Hadoop setup, while Apache Spark is an in memory compute engine, and that's why [it is] much faster than Apache Hive. While Apache HBase mostly deals with unstructured data, while Apache Hive is suitable using structured data stored on HDFS.
Do you think Apache Hive delivers good value for the price?
Yes
Are you happy with Apache Hive's feature set?
Yes
Did Apache Hive live up to sales and marketing promises?
I wasn't involved with the selection/purchase process
Did implementation of Apache Hive go as expected?
Yes
Would you buy Apache Hive again?
Yes