Item: Apache Hive
Rating: 9
Author: akshay kashyap

Use Cases and Deployment Scope

We are using Apache Hive over an on-premise big data setup built on top of Cloud ERA Servers. Use case behind using Apache Hive [it] is query efficient over distributed system and runs queries faster, with parallel execution. We save our metrics such as user info, purchase history, transaction and preferences in HDFS file system and use Apache Hive to query on top of it and run analytics to display output.

Pros and Cons

Simple query language built on top of Ma reduce paradigm.
Provides parallel execution over distributed system.
Tabular format and connectors available for all cloud platforms.

Complex joins may take time to execute due to shuffling of data.
Static queries mostly.
Slower than Apache Spark by almost 100 times.
Dependent on external memory and storage to execute.

Most Important Features

Easy to setup and maintain on premise queries.
Distributed and parallel processing.
Anyone that is familiar with SQL can use Apache Hive and learn it very quickly.

Return on Investment

Improved the performance than a traditional DBMS.
Scalability is much better due to support of HDFS distributed processing.
Made queries much more efficient than a traditional database such as oracle.
Have to maintain on premise hardware is one dependency.

Alternatives Considered

Apache Spark and Apache HBase

Apache Hive is a query language developed by Facebook to query over a large distributed dataset. Apache is a query engine that runs on top of HDFS, so it utilizes the resources of HDFS Hadoop setup, while Apache Spark is an in memory compute engine, and that's why [it is] much faster than Apache Hive. While Apache HBase mostly deals with unstructured data, while Apache Hive is suitable using structured data stored on HDFS.

Key Insights

Do you think Apache Hive delivers good value for the price?

Yes

Are you happy with Apache Hive's feature set?

Yes

Did Apache Hive live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of Apache Hive go as expected?

Yes

Would you buy Apache Hive again?

Yes

Other Software Used

Apache Spark, Salesforce Marketing Cloud Interaction Studio (formerly Evergage + MyBuys), Apache Hadoop

Likelihood to Recommend

You can use Apache Hive to query over a large data warehouse which updates, append records on either batch or in real time. Apache queries can give you output in the desired format that you can use as any reporting tool such as Tableau, directly using Python.

Walk into the World of Big Data with Apache Hive

Overall Satisfaction with Apache Hive